Note: This post (and the script it contains) has been updated as of December 14, 2010. (v1.4.0) The script can also be downloaded from my server here.
Also, Posterous has done a lot of work on solving this problem since I wrote my script. You can see their latest solutions here.
Recently, I switched from TwitPic to Posterous as my method of posting phone pictures (and now video) to the Internet. But since I switched, I didn't want to have my data history split in two, so I decided to write a script to download each of my TwitPic images with their associated text and date, and upload them to Posterous with the same information.
Initially, I wanted to make one long post with all of the images, and their text below. However, with the Posterous API, it isn't possible to refer to a specific image in your body text, so individual posts is the way I went.
Along the way, I became familiar with yet another Linux command: curl.
I love that Posterous has an API that (once you figure out curl) is pretty easy to use. TwitPic, on the other hand, has absolutely zero support for exporting anything. The fact that they're so non-user-centric and out-dated was a driving force in my switching. The only reason I hadn't switched to img.ly already was because img.ly has a bug that prevents images sent from my phone from being posted, since my phone sends them without a file extension. I worked with their tech support for a while, but they didn't fix it. I got a new phone, but it was also a Samsung, and it did the same thing with images. Oh, well. Posterous is better.
Anyway, here is the script:
First run it with just the first two arguments, and it will download all of your TwitPic data, including thumbnail images. Once you're satisfied, supply your Posterous User ID, Password, and Site ID. (If you don't know your Site ID, run the script with your Posterous User ID, Password, and no Site ID, and it will query your Posterous site info as long as your Posterous credentials are valid.)
Note: if you want to run this from Windows, you should install Cygwin (with, at a mimum, curl and sed) and run it from there.
./twitpic-to-posterous.sh [twitpic-id] [working-dir] [postrous-id] [posterous-password] [posterous-site-id] [skip-number]
#!/bin/sh # Copyright 2010 Tim "burndive" of http://burndive.blogspot.com/ and http://tuxbox.blogspot.com/ # This software is licensed under the Creative Commons GNU GPL version 2.0 or later. # License informattion: http://creativecommons.org/licenses/GPL/2.0/ # This script was obtained from here: # http://tuxbox.blogspot.com/2010/03/twitpic-to-posterous-export-script.html RUN_DATE=`date +%F--%H-%m-%S` SCRIPT_VERSION_STRING="v1.4.0" TP_NAME=$1 WORKING_DIR=$2 P_ID=$3 P_PW=$4 P_SITE_ID=$5 UPLOAD_SKIP=$6 # Comma separated list of tags to apply to your posts P_TAGS="twitpic" # Whether or not to auto-post from Posterous P_AUTOPOST=0 # Whether or not the Posterous posts are marked private P_PRIVATE=0 # This is the default limit of the number of posts that can be uploaded per day P_API_LIMIT=50 DOWNLOAD_FULL=1 DOWNLOAD_SCALED=0 DOWNLOAD_THUMB=0 PREFIX=twitpic-$TP_NAME HTML_OUT=$PREFIX-all-$RUN_DATE.html UPLOAD_OUT=posterous-upload-$P_SITE_ID-$RUN_DATE.xml if [ -z "$TP_NAME" ]; then echo "You must supply a TP_NAME." exit fi if [ ! -d "$WORKING_DIR" ]; then echo "You must supply a WORKING_DIR." exit fi if [ -z "$UPLOAD_SKIP" ]; then UPLOAD_SKIP=0 fi UPLOAD_SKIP_DIGITS=`echo $UPLOAD_SKIP | sed -e 's/[^0-9]//g'` if [ "$UPLOAD_SKIP" != "$UPLOAD_SKIP_DIGITS" ]; then echo "Invalid UPLOAD_SKIP: $UPLOAD_SKIP" exit fi cd $WORKING_DIR if [ -f "$HTML_OUT" ]; then rm -v $HTML_OUT fi # If Posterous username and password were supplied, but not site ID, query the server and exit. P_SITE_INFO_FILE=posterous-$P_SITE_ID.out if [ ! -z "$P_ID" ] && [ ! -z "$P_PW" ] && [ -z "$P_SITE_ID" ]; then echo "Getting Posterous account info..." curl -u "$P_ID:$P_PW" "http://posterous.com/api/getsites" -o $P_SITE_INFO_FILE SITE_ID_RET=`grep "<id>$P_SITE_ID</id>" $P_SITE_INFO_FILE` if [ -z "$SITE_ID_RET" ]; then echo "Please supply your Posterous Site ID as the fifth argument." echo "Here is the response from the Posterous server. If you entered correct credentials, you should see your Site ID(s):" cat $P_SITE_INFO_FILE | tee -a $UPLOAD_OUT exit fi fi # Confirm that we have a valid Posterous Site ID if [ ! -z "$P_SITE_ID" ]; then echo "Getting Posterous account info..." curl -u "$P_ID:$P_PW" "http://posterous.com/api/getsites" -o $P_SITE_INFO_FILE SITE_ID_RET=`grep "<id>$P_SITE_ID</id>" $P_SITE_INFO_FILE` if [ -z "$SITE_ID_RET" ]; then echo "Make sure that you have supplied a valid Posterous Site ID as the fifth parameter. If you don't know your Site ID, leave it out, and this script will query the server." echo "Here is the response from the Posterous server. If you entered correct credentials, you should see your site ID(s):" cat $P_SITE_INFO_FILE | tee -a $UPLOAD_OUT exit fi fi MORE=1 PAGE=1 while [ $MORE -ne 0 ]; do echo PAGE: $PAGE FILENAME=$PREFIX-page-$PAGE.html if [ ! -s $FILENAME ]; then wget http://twitpic.com/photos/${TP_NAME}?page=$PAGE -O $FILENAME if [ ! -s "$FILENAME" ]; then echo "ERROR: could not get $FILENAME" | tee -a $LOG_FILE sleep 5 fi fi if [ -z "`grep "More photos >" $FILENAME`" ]; then MORE=0 else PAGE=`expr $PAGE + 1` fi done ALL_IDS=`cat $PREFIX-page-* | grep -Eo "<a href=\"/[a-zA-Z0-9]+\">" | grep -Eo "/[a-zA-Z0-9]+" | grep -Eo "[a-zA-Z0-9]+" | sort -r | xargs` # For Testing #ALL_IDS="1kdjc" COUNT=0 LOG_FILE=$PREFIX-log-$RUN_DATE.txt echo $ALL_IDS | tee -a $LOG_FILE for ID in $ALL_IDS; do COUNT=`expr $COUNT + 1` echo $ID: $COUNT | tee -a $LOG_FILE echo "Processing $ID..." FULL_HTML=$PREFIX-$ID-full.html if [ ! -s "$FULL_HTML" ]; then wget http://twitpic.com/$ID/full -O $FULL_HTML if [ ! -s "$FULL_HTML" ]; then echo "ERROR: could not get FULL_HTML for $ID" | tee -a $LOG_FILE sleep 5 fi fi TEXT=`grep "<img src=" $FULL_HTML | tail -n1 | grep -oE "alt=\"[^\"]*\"" | sed \ -e 's/^alt="//'\ -e 's/"$//'\ -e "s/'/'/g"\ -e 's/"/"/g'\ ` if [ "$TEXT" = "" ]; then TEXT="Untitled" fi echo "TEXT: $TEXT" | tee -a $LOG_FILE # Recognize hashtags and username references in the tweet TEXT_RICH=`echo "$TEXT" | sed \ -e 's/\B\@\([0-9A-Za-z_]\+\)/\@<a href="http:\/\/twitter.com\/\1">\1<\/a>/g' \ -e 's/\#\([0-9A-Za-z_-]*[A-Za-z_-]\+[0-9A-Za-z_-]*\)/<a href="http:\/\/twitter.com\/search\?q\=%23\1">\#\1<\/a>/g' \ ` echo "TEXT_RICH: $TEXT_RICH" | tee -a $LOG_FILE # Convert hashtags into post tags P_TAGS_POST=$P_TAGS`echo "$TEXT" | sed \ -e 's/\#\([^A-Za-z_-]\)*\B//g' \ -e 's/^[^\#]*$//g' \ -e 's/[^\#]*\(\#\([0-9A-Za-z_-]*[A-Za-z_-]\+[0-9A-Za-z_-]*\)\)[^\#]*\(\#[0-9]*\B\)*/,\2/g' \ ` # Uncomment if you don't want hashtags converted into post tags #P_TAGS_POST=$P_TAGS # Add custom tags from a file (optional). The file is formatted like this: # ,tag1,tag2,tag3 TAGS_FILE=$PREFIX-$ID-tags-extra.txt if [ -s "$TAGS_FILE" ]; then P_TAGS_POST=$P_TAGS_POST`cat $TAGS_FILE` fi echo "P_TAGS_POST: $P_TAGS_POST" | tee -a $LOG_FILE TEXT_FILE=$PREFIX-$ID-text.txt if [ ! -s $TEXT_FILE ]; then echo "$TEXT" > $TEXT_FILE fi FULL_URL=`grep "<img src=" $FULL_HTML | grep -Eo "src=\"[^\"]*\"" | grep -Eo "http://[^\"]*"` echo "FULL_URL: $FULL_URL" | tee -a $LOG_FILE SCALED_HTML=$PREFIX-$ID-scaled.html if [ ! -s "$SCALED_HTML" ]; then wget http://twitpic.com/$ID -O $SCALED_HTML if [ ! -s "$SCALED_HTML" ]; then echo "ERROR: could not get SCALED_HTML for $ID" | tee -a $LOG_FILE sleep 5 fi fi SCALED_URL=`grep "id=\"photo-display\"" $SCALED_HTML | grep -Eo "http://[^\"]*" | head -n1` echo "SCALED_URL: $SCALED_URL" | tee -a $LOG_FILE POST_DATE=`grep -Eo "Posted on [a-zA-Z0-9 ,]*" $SCALED_HTML | sed -e 's/Posted on //'` echo "POST_DATE: $POST_DATE" | tee -a $LOG_FILE THUMB_URL=`cat $PREFIX-page-* | grep -E "<a href=\"/$ID\">" | grep -Eo "src=\"[^\"]*\"" | head -n1 | sed -e 's/src=\"//' -e 's/\"$//'` echo "THUMB_URL: $THUMB_URL" | tee -a $LOG_FILE EXT=`echo "$FULL_URL" | grep -Eo "[a-zA-Z0-9]+\.[a-zA-Z0-9]+\?" | head -n1 | grep -Eo "\.[a-zA-Z0-9]+"` if [ -z "$EXT" ]; then EXT=`echo "$FULL_URL" | grep -Eo "\.[a-zA-Z0-9]+$"` fi echo "EXT: $EXT" if [ "$DOWNLOAD_FULL" -eq 1 ]; then FULL_FILE="$PREFIX-$ID-full$EXT" if [ ! -s $FULL_FILE ]; then wget "$FULL_URL" -O $FULL_FILE if [ ! -s "$FULL_FILE" ]; then echo "ERROR: could not get FULL_URL for $ID: $FULL_URL" | tee -a $LOG_FILE sleep 5 fi fi fi if [ "$DOWNLOAD_SCALED" -eq 1 ]; then SCALED_FILE=$PREFIX-$ID-scaled$EXT if [ ! -s $SCALED_FILE ]; then wget "$SCALED_URL" -O $SCALED_FILE if [ ! -s "$SCALED_FILE" ]; then echo "ERROR: could not get SCALED_URL for $ID: $SCALED_URL" | tee -a $LOG_FILE sleep 5 fi fi fi if [ "$DOWNLOAD_THUMB" -eq 1 ]; then THUMB_FILE=$PREFIX-$ID-thumb$EXT if [ ! -s $THUMB_FILE ]; then wget "$THUMB_URL" -O $THUMB_FILE if [ ! -s "$THUMB_FILE" ]; then echo "ERROR: could not get THUMB_URL for $ID: $THUMB_URL" | tee -a $LOG_FILE sleep 5 fi fi fi BODY_TEXT="$TEXT_RICH <p>[<a href=http://twitpic.com/$ID>Twitpic</a>]</p>" # Format the post date correctly YEAR=`echo "$POST_DATE" | sed -e 's/[A-Z][a-z]* [0-9]*, //'` DAY=`echo "$POST_DATE" | sed -e 's/[A-Z][a-z]* //' -e 's/, [0-9]*//'` MONTH=`echo "$POST_DATE" | sed -e 's/ [0-9]*, [0-9]*//' | sed \ -e 's/January/01/' \ -e 's/February/02/' \ -e 's/March/03/' \ -e 's/April/04/' \ -e 's/May/05/' \ -e 's/June/06/' \ -e 's/July/07/' \ -e 's/August/08/' \ -e 's/September/09/' \ -e 's/October/10/' \ -e 's/November/11/' \ -e 's/December/12/' \ ` # Adjust the time to local midnight when west of GMT HOURS_LOC=`date | grep -Eo " [0-9]{2}:" | sed -e 's/://' -e 's/ //'` HOURS_UTC=`date -u | grep -Eo " [0-9]{2}:" | sed -e 's/://' -e 's/ //'` HOURS_OFF=`expr $HOURS_UTC - $HOURS_LOC + 7` echo "HOURS_LOC: $HOURS_LOC" echo "HOURS_UTC: $HOURS_UTC" echo "HOURS_OFF: $HOURS_OFF" if [ "$HOURS_OFF" -lt 0 ]; then # We're east of GMT, do not adjust HOURS_OFF=0 fi if [ "$HOURS_OFF" -lt 10 ]; then HOURS_OFF=0$HOURS_OFF fi if [ "$DAY" != "" ] && [ "$DAY" -lt 10 ]; then DAY=0$DAY fi DATE_FORMATTED="$YEAR-$MONTH-$DAY-$HOURS_OFF:00" echo "DATE_FORMATTED: $DATE_FORMATTED" | tee -a $LOG_FILE echo "<p><img src='$FULL_FILE' alt='$TEXT' title='$TEXT' /></p>" >> $HTML_OUT echo "$BODY_TEXT" >> $HTML_OUT echo " Post date: $DATE_FORMATTED; Count: $COUNT" >> $HTML_OUT # Upload this Twitpic data to Posterous if [ ! -z "$P_SITE_ID" ]; then # First make sure we're under the API upload limit if [ "$COUNT" -le "$UPLOAD_SKIP" ]; then echo Skipping upload... continue fi if [ "$COUNT" -gt "`expr $UPLOAD_SKIP + $P_API_LIMIT`" ]; then echo "Skipping upload due to daily Posterous API upload limit of $P_API_LIMIT." echo "To resume uploading where we left off today, supply UPLOAD_SKIP parameter of `expr $UPLOAD_SKIP + $P_API_LIMIT`." continue fi P_OUT_FILE="posterous-$P_SITE_ID-$ID.out" if [ -s "$P_OUT_FILE" ]; then rm "$P_OUT_FILE" fi echo "Uploading Twitpic image..." curl -u "$P_ID:$P_PW" "http://posterous.com/api/newpost" -o "$P_OUT_FILE" \ -F "site_id=$P_SITE_ID" \ -F "title=$TEXT" \ -F "autopost=$P_AUTOPOST" \ -F "private=$P_PRIVATE" \ -F "date=$DATE_FORMATTED" \ -F "tags=$P_TAGS_POST" \ -F "source=burndive's Twitpic-to-Posterous script $SCRIPT_VERSION_STRING" \ -F "sourceLink=http://tuxbox.blogspot.com/2010/03/twitpic-to-posterous-export-script.html" \ -F "body=$BODY_TEXT" \ -F "media=@$FULL_FILE" cat $P_OUT_FILE | tee -a $UPLOAD_OUT fi done echo Done.
This software is licensed under the CC-GNU GPL version 2.0 or later.
PS: If you use my code, I appreciate comments to let me know, and any feedback you may have, especially if it's not working right for you, but also just to say thanks.
For convenience, you can download this script from my server.