Justin Duke

Mining Twitter to discover how bad I am at Threes

Threes is a wonderful game and also horribly addictive. If you haven’t yet had the misfortune of getting ensnared by its cartoonish eldritch tentacles, I recommend it wholeheartedly. While the game continues on its inexorable march to rob the productivity of nerdy types everywhere, you can tell that its already inspired a particular brand of developer fervor:

Personally, I was interested in discovering the general distribution of scores in the game: having broken the five-digit mark yesterday, I was curious to see how I stacked up against the rest of the world. 1 Knowing that the game had a “tweet your score” function that, judging by my Twitter feed, a surprising amount of the world uses, I decided to fire up Sublime Text and get to work.

The vast majority of tweets were composed using the default template, which looks something like this:

I just scored 3,384 in @ThreesGame! http://threesgame.com pic.twitter.com/bq3lqq1nIT

As a result, I could just search for “I just scored” and the specific handle and grabbing the data would be relatively easy 2.

api = TwitterAPI(consumer_key, consumer_secret, access_token_key, access_token_secret)
tweets = TwitterRestPager(api, 'search/tweets', {'q':'@ThreesGame'})
while True:
    try:
        for tweet in tweets.get_iterator():
            text = tweet.get('text')

            if not text or "I just scored" not in text:
                continue

            score = re.findall(r"[\d\,]+", text)[0]
            score = score.replace(",", "")

            username = tweet.get('user').get('screen_name')
            timestamp = tweet.get('created_at')
            tweet_id = str(tweet.get('id'))

            print ",".join((score, username, timestamp, tweet_id))

And, armed with the csv data, we could load it into pandas and get some quick summary statistics 3:

import pandas as pd
tweets = pd.read_csv('threes.csv')
print tweets['score'].describe()
count 7331.000000
mean 7039.220161
std 9898.609367
min 0.000000
25% 2325.000000
50% 3435.000000
75% 9112.500000
max 236484.000000

Not bad! Looks like I placed in the top quartile 4, though like with all games there are a few outliers who are crazy good. I’m looking at you, @natenewbies – unfortunately for my ego, that 236484 score is all too real.

Still, what’s the point of a huge array if we don’t graph it, right? First, we can bin them with sizes of 500 and see some cool stuff:

import vincent
score_distribution = np.histogram(list(tweets['score']), bins=map(lambda x: x * 500, range(200)))
score_distribution = pd.DataFrame(score_distribution[0], index=score_distribution[1][:-1])
area = vincent.Area(score_distribution)
area.axis_titles(x='Score (grouped by 500)', y='Frequency')
area.to_json('threes_area.json', html_out=True, html_path='threes_area.html')

It’s entertaining to see the distribution of score packets. Your score in Threes is roughly exponential based on the tiles themselves – getting a 6 only gives you 9 points, whereas a 3072 nets you a whopping 177,147. We can jack up the bin size from 500 points to 2000 points, to make things a little more pleasant to look at:

This is illustrated a bit more clearly with a cumulative frequency chart 5:

cumulative_frequency = [0]
all_scores = list(tweets['score'])
thresholds = range(0, int(max(map(float, all_scores))), 50)
for threshold in thresholds:
    for score in all_scores:
        if score < threshold:
            cumulative_frequency[-1] += 1
            all_scores.pop(all_scores.index(score))
    cumulative_frequency.append(cumulative_frequency[-1])

cumulative_frequency = map(lambda x: float(x) / len(list(tweets['score'])) * 100, cumulative_frequency)

line = vincent.Line(cumulative_frequency, thresholds)
line.axis_titles(x='Score', y='Percentage')
line.to_json('threes_line.json', html_out=True, html_path='threes_line.html')

Note the jumps at certain thresholds, corresponding with hitting each tile and collecting the fat exponential score bonus. The curves are a bit prettier at full fidelity – I had to sample the distribution at every 50 points because JavaScript doesn’t like rendering graphs with a hundred thousand points – but you still get the general sense of things.

So, in conclusion: if you’re in the trenches of being stuck on 192s and 384s like me, you have quite a ways to go. If you’re spawning Triferati like there’s no tomorrow, then you can truly consider yourself a member of the Threes 1%. And if you’re interested in checking out the data yourself, you can download it here.


  1. Spoiler alert – pretty poorly. [return]
  2. Yeah, theoretically someone could tweet the exact same content with a bogus score and throw everything off, but who cares? [return]
  3. If you’re following along at home: I added headers to the csv file before reading it because I was too lazy to do it programmatically. [return]
  4. Which definitely makes up for my complete inability to play chess, right? [return]
  5. thresholds is possibly the most unnecessarily complicated line of Python I’ve ever written. So compact, though! [return]
Liked this post? You should subscribe to my newsletter and follow me on Twitter.

(I've got an RSS feed, too, if you'd prefer.)