Mining Twitter to discover how bad I am at Threes
Threes is a wonderful game and also horribly addictive. If you haven’t yet had the misfortune of getting ensnared by its cartoonish eldritch tentacles, I recommend it wholeheartedly. While the game continues on its inexorable march to rob the productivity of nerdy types everywhere, you can tell that its already inspired a particular brand of developer fervor:
- On GitHub, developers are working on open source AIs
- Over at TouchArcade, people are hammering out proofs about the minimum possible number of tiles
- And, the true mark of success, Threes has already spawned a Chinese knockoff
Personally, I was interested in discovering the general distribution of scores in the game: having broken the five-digit mark yesterday, I was curious to see how I stacked up against the rest of the world. 1 Knowing that the game had a “tweet your score” function that, judging by my Twitter feed, a surprising amount of the world uses, I decided to fire up Sublime Text and get to work.
The vast majority of tweets were composed using the default template, which looks something like this:
I just scored 3,384 in @ThreesGame! http://threesgame.com pic.twitter.com/bq3lqq1nIT
As a result, I could just search for “I just scored” and the specific handle and grabbing the data would be relatively easy 2.
api = TwitterAPI(consumer_key, consumer_secret, access_token_key, access_token_secret)
tweets = TwitterRestPager(api, 'search/tweets', {'q':'@ThreesGame'})
while True:
try:
for tweet in tweets.get_iterator():
text = tweet.get('text')
if not text or "I just scored" not in text:
continue
score = re.findall(r"[\d\,]+", text)[0]
score = score.replace(",", "")
username = tweet.get('user').get('screen_name')
timestamp = tweet.get('created_at')
tweet_id = str(tweet.get('id'))
print ",".join((score, username, timestamp, tweet_id))
And, armed with the csv data, we could load it into pandas
and get some quick summary statistics 3:
import pandas as pd
tweets = pd.read_csv('threes.csv')
print tweets['score'].describe()
mean 7039.220161
std 9898.609367
min 0.000000
25% 2325.000000
50% 3435.000000
75% 9112.500000
max 236484.000000
Not bad! Looks like I placed in the top quartile 4, though like with all games there are a few outliers who are crazy good. I’m looking at you, @natenewbies – unfortunately for my ego, that 236484 score is all too real.
Still, what’s the point of a huge array if we don’t graph it, right? First, we can bin them with sizes of 500 and see some cool stuff:
import vincent
score_distribution = np.histogram(list(tweets['score']), bins=map(lambda x: x * 500, range(200)))
score_distribution = pd.DataFrame(score_distribution[0], index=score_distribution[1][:-1])
area = vincent.Area(score_distribution)
area.axis_titles(x='Score (grouped by 500)', y='Frequency')
area.to_json('threes_area.json', html_out=True, html_path='threes_area.html')
It’s entertaining to see the distribution of score packets. Your score in Threes is roughly exponential based on the tiles themselves – getting a 6 only gives you 9 points, whereas a 3072 nets you a whopping 177,147. We can jack up the bin size from 500 points to 2000 points, to make things a little more pleasant to look at:
This is illustrated a bit more clearly with a cumulative frequency chart 5:
cumulative_frequency = [0]
all_scores = list(tweets['score'])
thresholds = range(0, int(max(map(float, all_scores))), 50)
for threshold in thresholds:
for score in all_scores:
if score < threshold:
cumulative_frequency[-1] += 1
all_scores.pop(all_scores.index(score))
cumulative_frequency.append(cumulative_frequency[-1])
cumulative_frequency = map(lambda x: float(x) / len(list(tweets['score'])) * 100, cumulative_frequency)
line = vincent.Line(cumulative_frequency, thresholds)
line.axis_titles(x='Score', y='Percentage')
line.to_json('threes_line.json', html_out=True, html_path='threes_line.html')
Note the jumps at certain thresholds, corresponding with hitting each tile and collecting the fat exponential score bonus. The curves are a bit prettier at full fidelity – I had to sample the distribution at every 50 points because JavaScript doesn’t like rendering graphs with a hundred thousand points – but you still get the general sense of things.
So, in conclusion: if you’re in the trenches of being stuck on 192s and 384s like me, you have quite a ways to go. If you’re spawning Triferati like there’s no tomorrow, then you can truly consider yourself a member of the Threes 1%. And if you’re interested in checking out the data yourself, you can download it here.
- Spoiler alert – pretty poorly. [return]
- Yeah, theoretically someone could tweet the exact same content with a bogus score and throw everything off, but who cares? [return]
- If you’re following along at home: I added headers to the csv file before reading it because I was too lazy to do it programmatically. [return]
- Which definitely makes up for my complete inability to play chess, right? [return]
thresholds
is possibly the most unnecessarily complicated line of Python I’ve ever written. So compact, though! [return]