Yesterday, the New England Patriots posted a 43-22 victory over the Indianapolis Colts, which is the first time that final score has ever occurred in NFL history.
This got me thinking about the distribution of various final point differentials over the years -- and, since Pro Football Reference has an 'export as CSV' option, I decided to spend an evening tinkering around with the data in pandas, matplotlib, and the newly discovered vincent.
I'll give you the good stuff up front, then explain how I did it:
So how did I do it?
Parsing the data was relatively easy:
# source: http://www.pro-football-reference.com/boxscores/game_scores.cgi SOURCE_FILE = "./nflscores.csv" data = pd.read_csv(SOURCE_FILE, header=0)
Though I had to get rid of annoying interstitial headers that were causing pandas to interpret the columns as text:
header_rows = data.apply(lambda row : row['Rk'] == 'Rk', axis=1) data = data[~header_rows] data[['PtDif', 'Count']] = data[['PtDif', 'Count']].astype('int')
Since the data source included the differential as a column, it was easy to group all of the final scores. Then, we had to fill in zeroes for any possible distribution that never occurred:
score_differentials = data.groupby('PtDif').sum()['Count'] populate_histogram = lambda diff: score_differentials[diff] if diff in score_differentials else 0 histogram = [populate_histogram(i) for i in range(74)]
Lastly, we create the bar graph itself (it's called
line here because I had it originally as a line graph, because I am a dummy):
line = vincent.Bar(histogram) line.axis_titles(x='Point differential', y='Games') line.height = 300 line.width = 900 ax = vincent.AxisProperties(labels = vincent.PropertySet(angle=vincent.ValueRef(value=90))) line.axes.properties = ax
Still, this is literally converting two-dimensional data into one-dimensional data. After puttering around for a little bit, I decided it would be interesting to convert this into a heatmap, with the x-axis representing points scored by the winning team and the y-axis representing points scored by the losers. Since vincent doesn't seem to have heatmap functionality, I turned back to matplotlib:
data[['PtsW', 'PtsL']] = data[['PtsW', 'PtsL']].astype('int') pivoted_data = data.pivot(index='PtsW', columns='PtsL', values='Count') # This is devestatingly ugly code. populate_heatmap = lambda x, y: pivoted_data[x][y] if x in pivoted_data and y in pivoted_data[x] else 0 heatmap_data = pd.DataFrame([[populate_heatmap(x, y) for y in range(73)] for x in range(73)]) plt.pcolor(heatmap_data, cmap=plt.cm.Blues, alpha=0.8, vmin=0, vmax=50)
Some points here:
- There's gotta be an easier way to fill in the dataframe than what I did, but I was feeling lazy: I'd imagine something using
np.zeroeswould do the trick.
- If, like me, you are not readily equipped with encyclopedic knowledge of the
pcolor()method which does the vast majority of the work for me: you specify the color palette via
cmap, and the range of values it maps via
vmax. (You'd notice that I top off at 50, which means that a final score with 52 occurrences will appear the same as one with 520.)
Some immediate observations:
- matplotlib is incredibly ugly.
- There are some pretty cool patterns, namely the absence of certain scores like 8, 11, and 18 -- which would involve some really strange play-calling.
- We get a rather neat triangular formation from the simple reality that losing team never scores more points than the winning team (and thus there are no points above the
y = xaxis).
We can try to recreate it in Google Charts to make it a little prettier:
Anyway, that's all I have -- if you found it interesting, either from a programming or a football perspective, please share it! I've uploaded the entire script (warts and all) to GitHub so feel free to play around with it. If you have any questions or ideas for the data, definitely let me know either via email or in the comments.