Wednesday, November 28, 2012

Humble Bundle Data - Android 4

I did it again.  This time, the bundle was a games bundle (The Humble Bundle for Android 4), which is Humble's normal fare.  Also this time, I was able to collect data from the very beginning.  There was a period of 7 hours where my server got turned off and no data was collected, but that time slot was in the middle of the data collection with no major events occurring near it.  I patched it up with a little bit of linear interpolation.  It shows up on the Marginal Average Price as a plateau, but is otherwise unremarkable.

One thing about this graph that I was not able to capture last time is that the average price was actually highest at the beginning, before sinking to a low, and then slowly rising.  The "event" in the middle is when games from the previous Android bundle were added as an additional bonus.  Interestingly, I was able to purchase at the point of lowest average price though I did beat it by making my purchase price a nice round number, thereby contributing to the bounce-back of the average price from its initial fall.

The initial fall is probably due to a bunch of people paying $0, or $1 immediately just to get the basic games and/or Steam keys.

The ramp-up of initial purchases is quite high, as you can see.

In this case at least, the initial "bump" was much more significant than the subsequent "blip" produced by adding more bonus content.  A lot of people already have the previous bundle games, so this is somewhat expected.

Here again is the raw data, for those who might be interested:

Thursday, November 01, 2012

Humble Bundle Data - Results

In my previous post, I said I was going to finish collecting data for the rest of the Humble eBook Bundle at and post the results here.  

The results are in.  This was probably the most successful Humble Bundle to date, based on the ending average price.  Let's look at the data for the Average Price over Time.

Average Price over Time
You may notice that bump in the middle of the graph.  That is the point in time when several PDF comic books were added to the Humble eBook Bundle.  This had the effect of pushing a lot of fence-sitters over the edge to purchase the bundle, as well as increase the margin by which purchasers were willing to "beat" the average price.

Total Purchases over Time
You may notice that the Total Purchases graph has some missing data at the beginning.  This is because initially I only collected the Average Price (and I was over an hour late in starting that collection).  The bundle started at 10:00 AM PDT; I started recording the average price at 11:30 AM, and I started collecting the total number of purchases at 5:30 PM. 

Total Revenue over Time
With those two numbers, I was able to calculate the total revenue collected.  Later, I added direct collection of this figure.

As you can see from the graph.  At no point did the average price go down much at all (there were a few times it went down a penny or two).  This answers the question I was initially asking, at least for this bundle:  should I wait for a lower price?  The answer is of course emphatically no.  If I was going to beat the average price, the time to do it was as early as possible.

I do seem to recall bundle average prices going down in past bundles, but this may have been due to abuse by people pumping the system for free and/or very cheap Steam keys.  That practice seems to have been cracked down upon with CAPTCHAs (remember: only use your scripting powers for good), and the momentum of the price and interest in the bundle seems to have been maintained by the addition of the bonus content.  I would expect similar measures in the future.

As it turned out, I when the extra bonus content was added, it was also added to the account of everyone who had previously purchased the bundle, whether they beat the average or not, so I ended up with all but the initial two bonus books.

These marginal rates were all calculated from the previous values.  It would be interesting to have better data at the beginning of the data set.  I'm curious to know how the profile of the initial wave compares to the second bump.  You can't really tell with the first seven hours missing, unfortunately.  My suspicion is that the secondary bump was sharper than the initial wave, mostly because I believe that the secondary wave was largely fence-sitters who had not bought because they thought the price was too high for the content offered.  When more content was offered (and considering the content), they immediately jumped on it.

Marginal Purchases (every 15 minutes)

Marginal Revenue (every 15 minutes)

Marginal Average Price (every 15 minutes)
It should be noted that the website data is far from perfect.  The totals sometimes went down from one reading to the next, and after the bundle had ended, the numbers were still in flux for several hours, but at the scale of these graphs, the fluctuations are insignificant.

If you would like to look at my raw data, I will provide it for download in its unprocessed CSV format generated by my script, as well as the Excel spreadsheet that I used to calculate the missing values and create these fancy graphs.

Here is the data:
Here is the final form of the script I used to create this data: