Monday, April 13, 2009

That's What She sed

Lately, I've been uploading pictures to Twitter from my phone using TwitPic. Basically, you send them to a TwitPic e-mail address via multimedia messaging, and they are automatically posted to your Twitter account, along with the text from the subject. This all works quite well, and they even supply an RSS feed of your pictures, which you can take and (among other thing) put on your blog's sidebar. The problem was, when I put it in my blog sidebar, there was no thumbnail image. Other feeds that had images in them would have thumbnails, but not this one. This one just had a text link to the picture page. I found that disappointing. So I examined feeds that showed thumbnails and the TwitPic feed to see what the difference was. Feeds that contained images within the feed content showed up in the Blogger widget with a thumbnail. But the TwitPic feed showed images. What was the difference? The difference turned out to be CDATA. CDATA is a way to tell a feed reader, "Don't try to decipher my contents, just pass them along and leave the rendering to the end user application." It so happens that TwitPic's thumbnail images are within a CDATA block, and Blogger obediently ignores the CDATA contents when looking for images to display as a thumbnail. So, how do I fix that? I need to read the feed, and for each item, locate the line that contains the thumbnail URL, and create a new attribute containing the thumbnail in a format that is decipherable to Blogger's widget. Using my digg feed as a model, I figured out what the end result should look like, but how to achieve it? First, I tried Yahoo Pipes. Yahoo has a tool for processing feeds with a number of tools, controlled by a graphical pipe-looking interface. The problem is, none of the tools that I could find would add an attribute based on the transformed contents of another attribute. There were widgets that came close, but I couldn't get it to work, so I decided to host the feed myself and modify it using sed. I had never used sed before, except when the exact command was given, so I didn't know how to use it, but I knew that it was a powerful enough tool to get the job done. So I created a shell script on my Linux box, and a cron job to run it. The script basically downloaded the RSS feed from TwitPic to a local file, and then called sed on it with a particular set of parameters designed to extract the necessary information, and add the appropriate information in a format that is decipherable to Blogger. In order to understand sed, I searched the Internet for a tutorial, and found this page from the Gentoo Linux Documentation to be the most helpful. My sed command does two things, which are piped together:
  1. It adds an xmlns:media declaration, which allows me to use the media tag later on.
  2. It examines each CDATA line with the thumbnail URL, and below it, it adds a line with the media:thumbnail tag and the URL extracted from above.
sed -e 's/<rss version="[^"]*"/& xmlns:media="http:\/\/search.yahoo.com\/mrss\/"/g' $TMP_FILE | sed -e 's/\(http:\/\/twitpic.com\/show\/thumb\/[^"]*\).*/&\n <media:thumbnail url="\1" height="150" width="150" \/>/g' > $FEED_FILE
I know it's possible to consilidate the two sed commands into one and do it in one pass, but this works. I may tweak it in further revisions. It is not necessary to use a yahoo-defined media tag, so I might modify the script later on to simply transform the CDATA portion into parseable encoded HTML. I might also add that I'm using Feedburner to host the feed. Basically, I change the file on my server, and Feedburner goes there to get it, and offers it to the rest of the world. That way if my server is offline, the feed is still active and available, and I don't have to deal with the traffic, just the Feedburner hits. If anyone else wants their TwitPic feed to have thumbnails available, let me know, and I can set one up for you on my server through Feedburner. (It's pretty easy, since the TwitPic username is passed in to the script as a parameter). I can't guarantee anything, but since it's in my interest to keep the script working and up-to-date, you don't have much to worry about. All I need to know is your TwitPic (Twitter) username.
  • Update (2009-04-16): I have modified the code to accept all image formats, and be shorter.