Showing posts with label athena-link. Show all posts
Showing posts with label athena-link. Show all posts

Wednesday, December 08, 2010

Twitpic-to-Posterous Script: Another Update

A while ago I wrote a script to import my Twitpic photo posts to Posterous and posted it on this blog.

Even though Posterous now supplies their own working transfer tool, it has its limitations.  One person who tried that tool was unsatisfied, and tried my script.  He really liked the results, but he noticed some drawbacks to my script as well.  Here's what's new in my script v1.3.1:
  • New feature by request - #hashtags and @username mentions are now linked to the appropriate Twitter page in the body of the Posterous post.
  • Fix - issue where Twitpic now truncates the tweet text in the HTML title.  Switched to using the image alt text from the full page.  
  • Fix - Twitpic started escaping single and double quotes in the tweet text, which were showing up uninterpreted in the Posterous titles.  The script now handles them correctly.
  • Other changes
    • Only download the Full images by default (Scaled and thumbnails can be enabled by setting flags.)
    • Print an error message and pause for 5 seconds if a download fails (Twitpic was being unreliable during my testing.)
    • Other miscellaneous fixes and tweaks
Special thanks to @RyanMeray!
  • Update (v1.3.2): better regular expressions for @username and #hashtag formats. 
  • Update (v1.3.4): now optionally adds hashtags as post tags.
You can get the latest version here.

    Wednesday, March 17, 2010

    TwitPic to Posterous Export Script

    Note: This post (and the script it contains) has been updated as of December 14, 2010.  (v1.4.0) The script can also be downloaded from my server here.
    Also, Posterous has done a lot of work on solving this problem since I wrote my script.   You can see their latest solutions here.

    Recently, I switched from TwitPic to Posterous as my method of posting phone pictures (and now video) to the Internet.  But since I switched, I didn't want to have my data history split in two, so I decided to write a script to download each of my TwitPic images with their associated text and date, and upload them to Posterous with the same information.

    Initially, I wanted to make one long post with all of the images, and their text below.  However, with the Posterous API, it isn't possible to refer to a specific image in your body text, so individual posts is the way I went.

    Along the way, I became familiar with yet another Linux command: curl.

    I love that Posterous has an API that (once you figure out curl) is pretty easy to use.  TwitPic, on the other hand, has absolutely zero support for exporting anything.  The fact that they're so non-user-centric and out-dated was a driving force in my switching.  The only reason I hadn't switched to img.ly already was because img.ly has a bug that prevents images sent from my phone from being posted, since my phone sends them without a file extension.  I worked with their tech support for a while, but they didn't fix it.  I got a new phone, but it was also a Samsung, and it did the same thing with images.  Oh, well.  Posterous is better.

    Anyway, here is the script:

    First run it with just the first two arguments, and it will download all of your TwitPic data, including thumbnail images.  Once you're satisfied, supply your Posterous User ID, Password, and Site ID.  (If you don't know your Site ID, run the script with your Posterous User ID, Password, and no Site ID, and it will query your Posterous site info as long as your Posterous credentials are valid.)

    Note: if you want to run this from Windows, you should install Cygwin (with, at a mimum, curl and sed) and run it from there.

    ./twitpic-to-posterous.sh [twitpic-id] [working-dir] [postrous-id] [posterous-password] [posterous-site-id] [skip-number]
    #!/bin/sh
    
    # Copyright 2010 Tim "burndive" of http://burndive.blogspot.com/ and http://tuxbox.blogspot.com/
    # This software is licensed under the Creative Commons GNU GPL version 2.0 or later.
    # License informattion: http://creativecommons.org/licenses/GPL/2.0/
    
    # This script was obtained from here:
    # http://tuxbox.blogspot.com/2010/03/twitpic-to-posterous-export-script.html
    
    RUN_DATE=`date +%F--%H-%m-%S`
    SCRIPT_VERSION_STRING="v1.4.0"
    
    TP_NAME=$1
    WORKING_DIR=$2
    P_ID=$3
    P_PW=$4
    P_SITE_ID=$5
    UPLOAD_SKIP=$6
    
    # Comma separated list of tags to apply to your posts
    P_TAGS="twitpic"
    # Whether or not to auto-post from Posterous
    P_AUTOPOST=0
    # Whether or not the Posterous posts are marked private
    P_PRIVATE=0
    
    # This is the default limit of the number of posts that can be uploaded per day
    P_API_LIMIT=50
    
    DOWNLOAD_FULL=1
    DOWNLOAD_SCALED=0
    DOWNLOAD_THUMB=0
    PREFIX=twitpic-$TP_NAME
    HTML_OUT=$PREFIX-all-$RUN_DATE.html
    UPLOAD_OUT=posterous-upload-$P_SITE_ID-$RUN_DATE.xml
    
    if [ -z "$TP_NAME" ]; then
      echo "You must supply a TP_NAME."
      exit
    fi
    if [ ! -d "$WORKING_DIR" ]; then
      echo "You must supply a WORKING_DIR."
      exit
    fi
    if [ -z "$UPLOAD_SKIP" ]; then
      UPLOAD_SKIP=0
    fi
    UPLOAD_SKIP_DIGITS=`echo $UPLOAD_SKIP | sed -e 's/[^0-9]//g'`
    if [ "$UPLOAD_SKIP" != "$UPLOAD_SKIP_DIGITS" ]; then
      echo "Invalid UPLOAD_SKIP: $UPLOAD_SKIP"
      exit
    fi
    
    cd $WORKING_DIR
    
    if [ -f "$HTML_OUT" ]; then
      rm -v $HTML_OUT
    fi
    
    # If Posterous username and password were supplied, but not site ID, query the server and exit.
    P_SITE_INFO_FILE=posterous-$P_SITE_ID.out
    if [ ! -z "$P_ID" ] && [ ! -z "$P_PW" ] && [ -z "$P_SITE_ID" ]; then
      echo "Getting Posterous account info..."
      curl -u "$P_ID:$P_PW" "http://posterous.com/api/getsites" -o $P_SITE_INFO_FILE
      SITE_ID_RET=`grep "<id>$P_SITE_ID</id>" $P_SITE_INFO_FILE`
      if [ -z "$SITE_ID_RET" ]; then
        echo "Please supply your Posterous Site ID as the fifth argument."
        echo "Here is the response from the Posterous server.  If you entered correct credentials, you should see your Site ID(s):"
        cat $P_SITE_INFO_FILE | tee -a $UPLOAD_OUT
        exit
      fi
    fi
    
    # Confirm that we have a valid Posterous Site ID
    if [ ! -z "$P_SITE_ID" ]; then
      echo "Getting Posterous account info..."
      curl -u "$P_ID:$P_PW" "http://posterous.com/api/getsites" -o $P_SITE_INFO_FILE
      SITE_ID_RET=`grep "<id>$P_SITE_ID</id>" $P_SITE_INFO_FILE`
      if [ -z "$SITE_ID_RET" ]; then
        echo "Make sure that you have supplied a valid Posterous Site ID as the fifth parameter.  If you don't know your Site ID, leave it out, and this script will query the server."
        echo "Here is the response from the Posterous server.  If you entered correct credentials, you should see your site ID(s):"
        cat $P_SITE_INFO_FILE | tee -a $UPLOAD_OUT
        exit
      fi
    fi
    
    MORE=1
    PAGE=1
    while [ $MORE -ne 0 ]; do
      echo PAGE: $PAGE
      FILENAME=$PREFIX-page-$PAGE.html
      if [ ! -s $FILENAME ]; then
        wget http://twitpic.com/photos/${TP_NAME}?page=$PAGE -O $FILENAME
        if [ ! -s "$FILENAME" ]; then
          echo "ERROR: could not get $FILENAME" | tee -a $LOG_FILE
          sleep 5
        fi
      fi
      if [ -z "`grep "More photos &gt;" $FILENAME`" ]; then
        MORE=0
      else
        PAGE=`expr $PAGE + 1`
      fi
    done
    
    ALL_IDS=`cat $PREFIX-page-* | grep -Eo "<a href=\"/[a-zA-Z0-9]+\">" | grep -Eo "/[a-zA-Z0-9]+" | grep -Eo "[a-zA-Z0-9]+" | sort -r | xargs`
    
    # For Testing
    #ALL_IDS="1kdjc"
    
    COUNT=0
    LOG_FILE=$PREFIX-log-$RUN_DATE.txt
    
    echo $ALL_IDS | tee -a $LOG_FILE
    
    for ID in $ALL_IDS; do
      COUNT=`expr $COUNT + 1`
      echo $ID: $COUNT | tee -a $LOG_FILE
    
      echo "Processing $ID..."
      FULL_HTML=$PREFIX-$ID-full.html
      if [ ! -s "$FULL_HTML" ]; then
        wget http://twitpic.com/$ID/full -O $FULL_HTML
        if [ ! -s "$FULL_HTML" ]; then
          echo "ERROR: could not get FULL_HTML for $ID" | tee -a $LOG_FILE
          sleep 5
        fi
      fi
      TEXT=`grep "<img src=" $FULL_HTML | tail -n1 | grep -oE "alt=\"[^\"]*\"" | sed \
            -e 's/^alt="//'\
            -e 's/"$//'\
            -e "s/&#039;/'/g"\
            -e 's/&quot;/"/g'\
            `
      if [ "$TEXT" = "" ]; then
        TEXT="Untitled"
      fi
      echo "TEXT: $TEXT" | tee -a $LOG_FILE
      # Recognize hashtags and username references in the tweet
      TEXT_RICH=`echo "$TEXT" | sed \
            -e 's/\B\@\([0-9A-Za-z_]\+\)/\@<a href="http:\/\/twitter.com\/\1">\1<\/a>/g' \
            -e 's/\#\([0-9A-Za-z_-]*[A-Za-z_-]\+[0-9A-Za-z_-]*\)/<a href="http:\/\/twitter.com\/search\?q\=%23\1">\#\1<\/a>/g' \
            `
      echo "TEXT_RICH: $TEXT_RICH" | tee -a $LOG_FILE
    
      # Convert hashtags into post tags
      P_TAGS_POST=$P_TAGS`echo "$TEXT" | sed \
            -e 's/\#\([^A-Za-z_-]\)*\B//g' \
            -e 's/^[^\#]*$//g' \
            -e 's/[^\#]*\(\#\([0-9A-Za-z_-]*[A-Za-z_-]\+[0-9A-Za-z_-]*\)\)[^\#]*\(\#[0-9]*\B\)*/,\2/g' \
            `
      # Uncomment if you don't want hashtags converted into post tags
      #P_TAGS_POST=$P_TAGS
    
      # Add custom tags from a file (optional).  The file is formatted like this:
      # ,tag1,tag2,tag3
      TAGS_FILE=$PREFIX-$ID-tags-extra.txt
      if [ -s "$TAGS_FILE" ]; then
        P_TAGS_POST=$P_TAGS_POST`cat $TAGS_FILE`
      fi
      echo "P_TAGS_POST: $P_TAGS_POST" | tee -a $LOG_FILE
    
      TEXT_FILE=$PREFIX-$ID-text.txt
      if [ ! -s $TEXT_FILE ]; then
        echo "$TEXT" > $TEXT_FILE
      fi
      FULL_URL=`grep "<img src=" $FULL_HTML | grep -Eo "src=\"[^\"]*\"" | grep -Eo "http://[^\"]*"`
      echo "FULL_URL: $FULL_URL" | tee -a $LOG_FILE
    
      SCALED_HTML=$PREFIX-$ID-scaled.html
      if [ ! -s "$SCALED_HTML" ]; then
        wget http://twitpic.com/$ID -O $SCALED_HTML
        if [ ! -s "$SCALED_HTML" ]; then
          echo "ERROR: could not get SCALED_HTML for $ID" | tee -a $LOG_FILE
          sleep 5
        fi
      fi
      SCALED_URL=`grep "id=\"photo-display\"" $SCALED_HTML | grep -Eo "http://[^\"]*" | head -n1`
      echo "SCALED_URL: $SCALED_URL" | tee -a $LOG_FILE
      POST_DATE=`grep -Eo "Posted on [a-zA-Z0-9 ,]*" $SCALED_HTML | sed -e 's/Posted on //'`
      echo "POST_DATE: $POST_DATE" | tee -a $LOG_FILE
    
      THUMB_URL=`cat $PREFIX-page-* | grep -E "<a href=\"/$ID\">" | grep -Eo "src=\"[^\"]*\"" | head -n1 | sed -e 's/src=\"//' -e 's/\"$//'`
      echo "THUMB_URL: $THUMB_URL" | tee -a $LOG_FILE
    
      EXT=`echo "$FULL_URL" | grep -Eo "[a-zA-Z0-9]+\.[a-zA-Z0-9]+\?" | head -n1 | grep -Eo "\.[a-zA-Z0-9]+"`
      if [ -z "$EXT" ]; then
        EXT=`echo "$FULL_URL" | grep -Eo "\.[a-zA-Z0-9]+$"`
      fi
      echo "EXT: $EXT"
      if [ "$DOWNLOAD_FULL" -eq 1 ]; then
        FULL_FILE="$PREFIX-$ID-full$EXT"
        if [ ! -s $FULL_FILE ]; then
          wget "$FULL_URL" -O $FULL_FILE
          if [ ! -s "$FULL_FILE" ]; then
            echo "ERROR: could not get FULL_URL for $ID: $FULL_URL" | tee -a $LOG_FILE
            sleep 5
          fi
        fi
      fi
      if [ "$DOWNLOAD_SCALED" -eq 1 ]; then
        SCALED_FILE=$PREFIX-$ID-scaled$EXT
        if [ ! -s $SCALED_FILE ]; then
          wget "$SCALED_URL" -O $SCALED_FILE
          if [ ! -s "$SCALED_FILE" ]; then
            echo "ERROR: could not get SCALED_URL for $ID: $SCALED_URL" | tee -a $LOG_FILE
            sleep 5
          fi
        fi
      fi
      if [ "$DOWNLOAD_THUMB" -eq 1 ]; then
        THUMB_FILE=$PREFIX-$ID-thumb$EXT
        if [ ! -s $THUMB_FILE ]; then
          wget "$THUMB_URL" -O $THUMB_FILE
          if [ ! -s "$THUMB_FILE" ]; then
            echo "ERROR: could not get THUMB_URL for $ID: $THUMB_URL" | tee -a $LOG_FILE
            sleep 5
          fi
        fi
      fi
    
      BODY_TEXT="$TEXT_RICH <p>[<a href=http://twitpic.com/$ID>Twitpic</a>]</p>"
    
      # Format the post date correctly
      YEAR=`echo "$POST_DATE" | sed -e 's/[A-Z][a-z]* [0-9]*, //'`
      DAY=`echo "$POST_DATE" | sed -e 's/[A-Z][a-z]* //' -e 's/, [0-9]*//'`
      MONTH=`echo "$POST_DATE" | sed -e 's/ [0-9]*, [0-9]*//' | sed \
        -e 's/January/01/' \
        -e 's/February/02/' \
        -e 's/March/03/' \
        -e 's/April/04/' \
        -e 's/May/05/' \
        -e 's/June/06/' \
        -e 's/July/07/' \
        -e 's/August/08/' \
        -e 's/September/09/' \
        -e 's/October/10/' \
        -e 's/November/11/' \
        -e 's/December/12/' \
        `
      # Adjust the time to local midnight when west of GMT
      HOURS_LOC=`date | grep -Eo " [0-9]{2}:" | sed -e 's/://' -e 's/ //'`
      HOURS_UTC=`date -u | grep -Eo " [0-9]{2}:" | sed -e 's/://' -e 's/ //'`
      HOURS_OFF=`expr $HOURS_UTC - $HOURS_LOC + 7`
      echo "HOURS_LOC: $HOURS_LOC"
      echo "HOURS_UTC: $HOURS_UTC"
      echo "HOURS_OFF: $HOURS_OFF"
      if [ "$HOURS_OFF" -lt 0 ]; then
        # We're east of GMT, do not adjust
        HOURS_OFF=0
      fi
      if [ "$HOURS_OFF" -lt 10 ]; then
        HOURS_OFF=0$HOURS_OFF
      fi
      if [ "$DAY" != "" ] && [ "$DAY" -lt 10 ]; then
        DAY=0$DAY
      fi
      DATE_FORMATTED="$YEAR-$MONTH-$DAY-$HOURS_OFF:00"
      echo "DATE_FORMATTED: $DATE_FORMATTED" | tee -a $LOG_FILE
    
      echo "<p><img src='$FULL_FILE' alt='$TEXT' title='$TEXT' /></p>" >> $HTML_OUT
      echo "$BODY_TEXT" >> $HTML_OUT
      echo "  Post date: $DATE_FORMATTED; Count: $COUNT" >> $HTML_OUT
    
      # Upload this Twitpic data to Posterous
      if [ ! -z "$P_SITE_ID" ]; then
    
        # First make sure we're under the API upload limit
        if [ "$COUNT" -le "$UPLOAD_SKIP" ]; then
          echo Skipping upload...
          continue
        fi
        if [ "$COUNT" -gt "`expr $UPLOAD_SKIP + $P_API_LIMIT`" ]; then
          echo "Skipping upload due to daily Posterous API upload limit of $P_API_LIMIT."
          echo "To resume uploading where we left off today, supply UPLOAD_SKIP parameter of `expr $UPLOAD_SKIP + $P_API_LIMIT`."
          continue
        fi
    
        P_OUT_FILE="posterous-$P_SITE_ID-$ID.out"
        if [ -s "$P_OUT_FILE" ]; then
          rm "$P_OUT_FILE"
        fi
        echo "Uploading Twitpic image..."
        curl -u "$P_ID:$P_PW" "http://posterous.com/api/newpost" -o "$P_OUT_FILE" \
          -F "site_id=$P_SITE_ID" \
          -F "title=$TEXT" \
          -F "autopost=$P_AUTOPOST" \
          -F "private=$P_PRIVATE" \
          -F "date=$DATE_FORMATTED" \
          -F "tags=$P_TAGS_POST" \
          -F "source=burndive's Twitpic-to-Posterous script $SCRIPT_VERSION_STRING" \
          -F "sourceLink=http://tuxbox.blogspot.com/2010/03/twitpic-to-posterous-export-script.html" \
          -F "body=$BODY_TEXT" \
          -F "media=@$FULL_FILE"
        cat $P_OUT_FILE  | tee -a $UPLOAD_OUT
      fi
    done
    echo Done.
    CC-GNU GPL
    This software is licensed under the CC-GNU GPL version 2.0 or later.

    PS: If you use my code, I appreciate comments to let me know, and any feedback you may have, especially if it's not working right for you, but also just to say thanks.

    For convenience, you can download this script from my server.

    Friday, June 05, 2009

    Qwest Woes

    I just got off the phone with Qwest tech support. Apparently, the "modem" that I bought isn't just a modem, it's also a router, NAT, and firewall, so none of my port forwarding I had configured in my router was working. First, I had to configure the Qwest modem/router/thing (an Actiontech M1000) to "bridge mode", so that it would turn off its NAT and firewall and just give me a connection to the Qwest network, and then I had to enter my Qwest account PPPoE credentials into my Linksys router (with, to complicate things, Tomato firmware). But now it works as it should, so my HTTP server should be up and running on the 'Net.

    Thursday, May 08, 2008

    Baidu MP3

    Lately, it seems that Baidu MP3 has indexed my server, and is pointing Chinese searchers to various Bible talks and sermons which I have hosted there. I recently (well, it was probably late last year) relaxed restrictions on my robots.txt file to allow most of the content (not the pictures) to be indexed by search engines. Since then, I've been hit from all over the world, but this is pretty recent. What's different is the volume and frequency of the hits. Of course, they're welcome to the content (as long as my server isn't getting hammered). I'm a bit curious as to what's being done with them: do people listen to them in order to learn English? Are they really trying to find music and stumbled on my files by mistake? Are they interested in biblical teaching? Maybe someday I'll find out.

    Wednesday, April 16, 2008

    More DNS Woes

    So what happens when your DNS changes more often than your IP address? The whole point of DNS is so that your IP address can change, and you don't have to update your links (also, there's something about being human-friendly, but who needs that?). Sadly in my case, athena has had a string of DNS subdomains that haven't lasted quite so long as I had hoped. My problem is that I link to pages from my blog (mostly containing pictures, audio, and PDF documents) that I would like to be accessible on a permanent basis. Recently, I lost athena.sexypenguins.com, and so I've moved to athena.goddns.net. At some point, I'm just going to have to purchase my own domain. Thus far I've resisted out of (mostly) momentum, but now that I've registered my first domain and found that it's not so bad to be parked, I'm more inclined to plunk down the money. The question is, what should I choose? In the meantime, I've decided to create an "athena-link" label on my blogs, that will at least keep track of which posts have links to athena, so that I can update them whenever I have to change the DNS, which it appears will happen at least one more time. For now (and for the first time in a while), all of my links are up-to-date. Another problem with my current setup is that FreeDNS's policies dictate that I need to have Google's ability to access my server manually enabled every time that I switch to a new domain. This wouldn't be that big of a deal, except that I've started hosting feeds on my site, and Google Reader uses Google's DNS. Also, Google just bought out FeedBurner, and one of my feeds uses that as a proxy.