Note: This post (and the script it contains) has been updated as of December 14, 2010. (v1.4.0) The script can also be downloaded from my server here.
Also, Posterous has done a lot of work on solving this problem since I wrote my script. You can see their latest solutions here.
Recently, I switched from TwitPic to Posterous as my method of posting phone pictures (and now video) to the Internet. But since I switched, I didn't want to have my data history split in two, so I decided to write a script to download each of my TwitPic images with their associated text and date, and upload them to Posterous with the same information.
Initially, I wanted to make one long post with all of the images, and their text below. However, with the Posterous API, it isn't possible to refer to a specific image in your body text, so individual posts is the way I went.
Along the way, I became familiar with yet another Linux command: curl.
I love that Posterous has an API that (once you figure out curl) is pretty easy to use. TwitPic, on the other hand, has absolutely zero support for exporting anything. The fact that they're so non-user-centric and out-dated was a driving force in my switching. The only reason I hadn't switched to img.ly already was because img.ly has a bug that prevents images sent from my phone from being posted, since my phone sends them without a file extension. I worked with their tech support for a while, but they didn't fix it. I got a new phone, but it was also a Samsung, and it did the same thing with images. Oh, well. Posterous is better.
Anyway, here is the script:
First run it with just the first two arguments, and it will download all of your TwitPic data, including thumbnail images. Once you're satisfied, supply your Posterous User ID, Password, and Site ID. (If you don't know your Site ID, run the script with your Posterous User ID, Password, and no Site ID, and it will query your Posterous site info as long as your Posterous credentials are valid.)
Note: if you want to run this from Windows, you should install Cygwin (with, at a mimum, curl and sed) and run it from there.
./twitpic-to-posterous.sh [twitpic-id] [working-dir] [postrous-id] [posterous-password] [posterous-site-id] [skip-number]
#!/bin/sh
# Copyright 2010 Tim "burndive" of http://burndive.blogspot.com/ and http://tuxbox.blogspot.com/
# This software is licensed under the Creative Commons GNU GPL version 2.0 or later.
# License informattion: http://creativecommons.org/licenses/GPL/2.0/
# This script was obtained from here:
# http://tuxbox.blogspot.com/2010/03/twitpic-to-posterous-export-script.html
RUN_DATE=`date +%F--%H-%m-%S`
SCRIPT_VERSION_STRING="v1.4.0"
TP_NAME=$1
WORKING_DIR=$2
P_ID=$3
P_PW=$4
P_SITE_ID=$5
UPLOAD_SKIP=$6
# Comma separated list of tags to apply to your posts
P_TAGS="twitpic"
# Whether or not to auto-post from Posterous
P_AUTOPOST=0
# Whether or not the Posterous posts are marked private
P_PRIVATE=0
# This is the default limit of the number of posts that can be uploaded per day
P_API_LIMIT=50
DOWNLOAD_FULL=1
DOWNLOAD_SCALED=0
DOWNLOAD_THUMB=0
PREFIX=twitpic-$TP_NAME
HTML_OUT=$PREFIX-all-$RUN_DATE.html
UPLOAD_OUT=posterous-upload-$P_SITE_ID-$RUN_DATE.xml
if [ -z "$TP_NAME" ]; then
echo "You must supply a TP_NAME."
exit
fi
if [ ! -d "$WORKING_DIR" ]; then
echo "You must supply a WORKING_DIR."
exit
fi
if [ -z "$UPLOAD_SKIP" ]; then
UPLOAD_SKIP=0
fi
UPLOAD_SKIP_DIGITS=`echo $UPLOAD_SKIP | sed -e 's/[^0-9]//g'`
if [ "$UPLOAD_SKIP" != "$UPLOAD_SKIP_DIGITS" ]; then
echo "Invalid UPLOAD_SKIP: $UPLOAD_SKIP"
exit
fi
cd $WORKING_DIR
if [ -f "$HTML_OUT" ]; then
rm -v $HTML_OUT
fi
# If Posterous username and password were supplied, but not site ID, query the server and exit.
P_SITE_INFO_FILE=posterous-$P_SITE_ID.out
if [ ! -z "$P_ID" ] && [ ! -z "$P_PW" ] && [ -z "$P_SITE_ID" ]; then
echo "Getting Posterous account info..."
curl -u "$P_ID:$P_PW" "http://posterous.com/api/getsites" -o $P_SITE_INFO_FILE
SITE_ID_RET=`grep "<id>$P_SITE_ID</id>" $P_SITE_INFO_FILE`
if [ -z "$SITE_ID_RET" ]; then
echo "Please supply your Posterous Site ID as the fifth argument."
echo "Here is the response from the Posterous server. If you entered correct credentials, you should see your Site ID(s):"
cat $P_SITE_INFO_FILE | tee -a $UPLOAD_OUT
exit
fi
fi
# Confirm that we have a valid Posterous Site ID
if [ ! -z "$P_SITE_ID" ]; then
echo "Getting Posterous account info..."
curl -u "$P_ID:$P_PW" "http://posterous.com/api/getsites" -o $P_SITE_INFO_FILE
SITE_ID_RET=`grep "<id>$P_SITE_ID</id>" $P_SITE_INFO_FILE`
if [ -z "$SITE_ID_RET" ]; then
echo "Make sure that you have supplied a valid Posterous Site ID as the fifth parameter. If you don't know your Site ID, leave it out, and this script will query the server."
echo "Here is the response from the Posterous server. If you entered correct credentials, you should see your site ID(s):"
cat $P_SITE_INFO_FILE | tee -a $UPLOAD_OUT
exit
fi
fi
MORE=1
PAGE=1
while [ $MORE -ne 0 ]; do
echo PAGE: $PAGE
FILENAME=$PREFIX-page-$PAGE.html
if [ ! -s $FILENAME ]; then
wget http://twitpic.com/photos/${TP_NAME}?page=$PAGE -O $FILENAME
if [ ! -s "$FILENAME" ]; then
echo "ERROR: could not get $FILENAME" | tee -a $LOG_FILE
sleep 5
fi
fi
if [ -z "`grep "More photos >" $FILENAME`" ]; then
MORE=0
else
PAGE=`expr $PAGE + 1`
fi
done
ALL_IDS=`cat $PREFIX-page-* | grep -Eo "<a href=\"/[a-zA-Z0-9]+\">" | grep -Eo "/[a-zA-Z0-9]+" | grep -Eo "[a-zA-Z0-9]+" | sort -r | xargs`
# For Testing
#ALL_IDS="1kdjc"
COUNT=0
LOG_FILE=$PREFIX-log-$RUN_DATE.txt
echo $ALL_IDS | tee -a $LOG_FILE
for ID in $ALL_IDS; do
COUNT=`expr $COUNT + 1`
echo $ID: $COUNT | tee -a $LOG_FILE
echo "Processing $ID..."
FULL_HTML=$PREFIX-$ID-full.html
if [ ! -s "$FULL_HTML" ]; then
wget http://twitpic.com/$ID/full -O $FULL_HTML
if [ ! -s "$FULL_HTML" ]; then
echo "ERROR: could not get FULL_HTML for $ID" | tee -a $LOG_FILE
sleep 5
fi
fi
TEXT=`grep "<img src=" $FULL_HTML | tail -n1 | grep -oE "alt=\"[^\"]*\"" | sed \
-e 's/^alt="//'\
-e 's/"$//'\
-e "s/'/'/g"\
-e 's/"/"/g'\
`
if [ "$TEXT" = "" ]; then
TEXT="Untitled"
fi
echo "TEXT: $TEXT" | tee -a $LOG_FILE
# Recognize hashtags and username references in the tweet
TEXT_RICH=`echo "$TEXT" | sed \
-e 's/\B\@\([0-9A-Za-z_]\+\)/\@<a href="http:\/\/twitter.com\/\1">\1<\/a>/g' \
-e 's/\#\([0-9A-Za-z_-]*[A-Za-z_-]\+[0-9A-Za-z_-]*\)/<a href="http:\/\/twitter.com\/search\?q\=%23\1">\#\1<\/a>/g' \
`
echo "TEXT_RICH: $TEXT_RICH" | tee -a $LOG_FILE
# Convert hashtags into post tags
P_TAGS_POST=$P_TAGS`echo "$TEXT" | sed \
-e 's/\#\([^A-Za-z_-]\)*\B//g' \
-e 's/^[^\#]*$//g' \
-e 's/[^\#]*\(\#\([0-9A-Za-z_-]*[A-Za-z_-]\+[0-9A-Za-z_-]*\)\)[^\#]*\(\#[0-9]*\B\)*/,\2/g' \
`
# Uncomment if you don't want hashtags converted into post tags
#P_TAGS_POST=$P_TAGS
# Add custom tags from a file (optional). The file is formatted like this:
# ,tag1,tag2,tag3
TAGS_FILE=$PREFIX-$ID-tags-extra.txt
if [ -s "$TAGS_FILE" ]; then
P_TAGS_POST=$P_TAGS_POST`cat $TAGS_FILE`
fi
echo "P_TAGS_POST: $P_TAGS_POST" | tee -a $LOG_FILE
TEXT_FILE=$PREFIX-$ID-text.txt
if [ ! -s $TEXT_FILE ]; then
echo "$TEXT" > $TEXT_FILE
fi
FULL_URL=`grep "<img src=" $FULL_HTML | grep -Eo "src=\"[^\"]*\"" | grep -Eo "http://[^\"]*"`
echo "FULL_URL: $FULL_URL" | tee -a $LOG_FILE
SCALED_HTML=$PREFIX-$ID-scaled.html
if [ ! -s "$SCALED_HTML" ]; then
wget http://twitpic.com/$ID -O $SCALED_HTML
if [ ! -s "$SCALED_HTML" ]; then
echo "ERROR: could not get SCALED_HTML for $ID" | tee -a $LOG_FILE
sleep 5
fi
fi
SCALED_URL=`grep "id=\"photo-display\"" $SCALED_HTML | grep -Eo "http://[^\"]*" | head -n1`
echo "SCALED_URL: $SCALED_URL" | tee -a $LOG_FILE
POST_DATE=`grep -Eo "Posted on [a-zA-Z0-9 ,]*" $SCALED_HTML | sed -e 's/Posted on //'`
echo "POST_DATE: $POST_DATE" | tee -a $LOG_FILE
THUMB_URL=`cat $PREFIX-page-* | grep -E "<a href=\"/$ID\">" | grep -Eo "src=\"[^\"]*\"" | head -n1 | sed -e 's/src=\"//' -e 's/\"$//'`
echo "THUMB_URL: $THUMB_URL" | tee -a $LOG_FILE
EXT=`echo "$FULL_URL" | grep -Eo "[a-zA-Z0-9]+\.[a-zA-Z0-9]+\?" | head -n1 | grep -Eo "\.[a-zA-Z0-9]+"`
if [ -z "$EXT" ]; then
EXT=`echo "$FULL_URL" | grep -Eo "\.[a-zA-Z0-9]+$"`
fi
echo "EXT: $EXT"
if [ "$DOWNLOAD_FULL" -eq 1 ]; then
FULL_FILE="$PREFIX-$ID-full$EXT"
if [ ! -s $FULL_FILE ]; then
wget "$FULL_URL" -O $FULL_FILE
if [ ! -s "$FULL_FILE" ]; then
echo "ERROR: could not get FULL_URL for $ID: $FULL_URL" | tee -a $LOG_FILE
sleep 5
fi
fi
fi
if [ "$DOWNLOAD_SCALED" -eq 1 ]; then
SCALED_FILE=$PREFIX-$ID-scaled$EXT
if [ ! -s $SCALED_FILE ]; then
wget "$SCALED_URL" -O $SCALED_FILE
if [ ! -s "$SCALED_FILE" ]; then
echo "ERROR: could not get SCALED_URL for $ID: $SCALED_URL" | tee -a $LOG_FILE
sleep 5
fi
fi
fi
if [ "$DOWNLOAD_THUMB" -eq 1 ]; then
THUMB_FILE=$PREFIX-$ID-thumb$EXT
if [ ! -s $THUMB_FILE ]; then
wget "$THUMB_URL" -O $THUMB_FILE
if [ ! -s "$THUMB_FILE" ]; then
echo "ERROR: could not get THUMB_URL for $ID: $THUMB_URL" | tee -a $LOG_FILE
sleep 5
fi
fi
fi
BODY_TEXT="$TEXT_RICH <p>[<a href=http://twitpic.com/$ID>Twitpic</a>]</p>"
# Format the post date correctly
YEAR=`echo "$POST_DATE" | sed -e 's/[A-Z][a-z]* [0-9]*, //'`
DAY=`echo "$POST_DATE" | sed -e 's/[A-Z][a-z]* //' -e 's/, [0-9]*//'`
MONTH=`echo "$POST_DATE" | sed -e 's/ [0-9]*, [0-9]*//' | sed \
-e 's/January/01/' \
-e 's/February/02/' \
-e 's/March/03/' \
-e 's/April/04/' \
-e 's/May/05/' \
-e 's/June/06/' \
-e 's/July/07/' \
-e 's/August/08/' \
-e 's/September/09/' \
-e 's/October/10/' \
-e 's/November/11/' \
-e 's/December/12/' \
`
# Adjust the time to local midnight when west of GMT
HOURS_LOC=`date | grep -Eo " [0-9]{2}:" | sed -e 's/://' -e 's/ //'`
HOURS_UTC=`date -u | grep -Eo " [0-9]{2}:" | sed -e 's/://' -e 's/ //'`
HOURS_OFF=`expr $HOURS_UTC - $HOURS_LOC + 7`
echo "HOURS_LOC: $HOURS_LOC"
echo "HOURS_UTC: $HOURS_UTC"
echo "HOURS_OFF: $HOURS_OFF"
if [ "$HOURS_OFF" -lt 0 ]; then
# We're east of GMT, do not adjust
HOURS_OFF=0
fi
if [ "$HOURS_OFF" -lt 10 ]; then
HOURS_OFF=0$HOURS_OFF
fi
if [ "$DAY" != "" ] && [ "$DAY" -lt 10 ]; then
DAY=0$DAY
fi
DATE_FORMATTED="$YEAR-$MONTH-$DAY-$HOURS_OFF:00"
echo "DATE_FORMATTED: $DATE_FORMATTED" | tee -a $LOG_FILE
echo "<p><img src='$FULL_FILE' alt='$TEXT' title='$TEXT' /></p>" >> $HTML_OUT
echo "$BODY_TEXT" >> $HTML_OUT
echo " Post date: $DATE_FORMATTED; Count: $COUNT" >> $HTML_OUT
# Upload this Twitpic data to Posterous
if [ ! -z "$P_SITE_ID" ]; then
# First make sure we're under the API upload limit
if [ "$COUNT" -le "$UPLOAD_SKIP" ]; then
echo Skipping upload...
continue
fi
if [ "$COUNT" -gt "`expr $UPLOAD_SKIP + $P_API_LIMIT`" ]; then
echo "Skipping upload due to daily Posterous API upload limit of $P_API_LIMIT."
echo "To resume uploading where we left off today, supply UPLOAD_SKIP parameter of `expr $UPLOAD_SKIP + $P_API_LIMIT`."
continue
fi
P_OUT_FILE="posterous-$P_SITE_ID-$ID.out"
if [ -s "$P_OUT_FILE" ]; then
rm "$P_OUT_FILE"
fi
echo "Uploading Twitpic image..."
curl -u "$P_ID:$P_PW" "http://posterous.com/api/newpost" -o "$P_OUT_FILE" \
-F "site_id=$P_SITE_ID" \
-F "title=$TEXT" \
-F "autopost=$P_AUTOPOST" \
-F "private=$P_PRIVATE" \
-F "date=$DATE_FORMATTED" \
-F "tags=$P_TAGS_POST" \
-F "source=burndive's Twitpic-to-Posterous script $SCRIPT_VERSION_STRING" \
-F "sourceLink=http://tuxbox.blogspot.com/2010/03/twitpic-to-posterous-export-script.html" \
-F "body=$BODY_TEXT" \
-F "media=@$FULL_FILE"
cat $P_OUT_FILE | tee -a $UPLOAD_OUT
fi
done
echo Done.This software is licensed under the CC-GNU GPL version 2.0 or later.
PS: If you use my code, I appreciate comments to let me know, and any feedback you may have, especially if it's not working right for you, but also just to say thanks.
For convenience, you can download this script from my server.
