Thursday, November 15, 2012

A simple blogroll with python and feedparser

A friend wanted to use a blog as a sort of database, allowing several people to create posts on the same blog, and then transform the posts into some sort of listing; most blogging software will let you export a list of posts as an rss feed, so this seemed like a good time to go learn how to parse and use rss.

While I'll let my friend figure out his own application, this prompted me to write a simple 'blogroll' program, that would read several rss (and atom) feeds and would produce html showing the appropriate titles and links.

Since python is my scripting language of choice for now, I googled rss parser libraries, and found feedparser.

We can call feedparser.parse, and give it a url; it will then return an object representing the feed; the object contains a field called feed, which contains information about the feed, like its title and its link (url); the object also contains entries, which is a list of objects, each representing one entry in the rss feed; for each entry, you have fields like its title and its link.

So, it is just a matter of iterating over a list of urls, parsing the feed for each, and going over its entries, producing html as we go, writing everything to a file. The final code looks like:

2 comments:

  1. You will definitely need templating engine like jinja2 for generating pages.

    ReplyDelete
    Replies
    1. Absolutely agree, if this was a complex project; I'd rather do it manually for something this simple

      Delete