Justin Duke

Extracting an image from an RSS feed in Python

I have a bit of a social media automation thing set up for Barback, in which I tweet articles from some of my favorite cocktail and liquor blogs. I’m pretty new to the whole social media shtick, but a lot of places recommend adding images and other media to increase engagement, so I set out looking for a way to grab images from the RSS feeds where I get articles.

Despite my assumptions, there’s no actual room in the RSS spec for an image. Most articles throw it in the summary or just the raw content (since the actual articles themselves have them pretty high up), but there’s nothing standard.

Thankfully, I was already using feedparser to parse the RSS itself:

feed_url = “http://rss.artofdrink.com/theartofdrink”
feed = feedparser.parse(feed_url)
articles = feed['entries']

So we’ve got a list of articles. Now, we can just use the summary — which itself is HTML — to look for images, letting the wonderful BeautifulSoup package do the heavy lifting:

for article in articles:
    soup = BeautifulSoup(article.summary)
    image_url = soup.find('img')['src']
    if image_url:
        # Hey, we have an image!

It’s that simple! You’re leaning on two heavy libraries to do all the work for you, but it’s better than trying to parse HTML itself, which is for all intents and purposes a toxic hellscape.

Liked this post? You should subscribe to my newsletter and follow me on Twitter.
© 2017 Justin Duke • All rights reserved • I hope you have a nice day.