Visualizing NFL point differentials
Yesterday, the New England Patriots posted a 43-22 victory over the Indianapolis Colts, which is the first time that final score has ever occurred in NFL history.
This got me thinking about the distribution of various final point differentials over the years – and, since Pro Football Reference has an ‘export as CSV’ option, I decided to spend an evening tinkering around with the data in pandas, matplotlib, and the newly discovered vincent.
I’ll give you the good stuff up front, then explain how I did it:
So how did I do it?
Parsing the data was relatively easy:
# source: http://www.pro-football-reference.com/boxscores/game_scores.cgi
SOURCE_FILE = "./nflscores.csv"
data = pd.read_csv(SOURCE_FILE, header=0)
Though I had to get rid of annoying interstitial headers that were causing pandas to interpret the columns as text:
header_rows = data.apply(lambda row : row['Rk'] == 'Rk', axis=1)
data = data[~header_rows]
data[['PtDif', 'Count']] = data[['PtDif', 'Count']].astype('int')
Since the data source included the differential as a column, it was easy to group all of the final scores. Then, we had to fill in zeroes for any possible distribution that never occurred:
score_differentials = data.groupby('PtDif').sum()['Count']
populate_histogram = lambda diff: score_differentials[diff] if diff in score_differentials else 0
histogram = [populate_histogram(i) for i in range(74)]
Lastly, we create the bar graph itself (it’s called line
here because I had it originally as a line graph, because I am a dummy):
line = vincent.Bar(histogram)
line.axis_titles(x='Point differential', y='Games')
line.height = 300
line.width = 900
ax = vincent.AxisProperties(labels = vincent.PropertySet(angle=vincent.ValueRef(value=90)))
line.axes[0].properties = ax
Still, this is literally converting two-dimensional data into one-dimensional data. After puttering around for a little bit, I decided it would be interesting to convert this into a heatmap, with the x-axis representing points scored by the winning team and the y-axis representing points scored by the losers. Since vincent doesn’t seem to have heatmap functionality, I turned back to matplotlib:
data[['PtsW', 'PtsL']] = data[['PtsW', 'PtsL']].astype('int')
pivoted_data = data.pivot(index='PtsW', columns='PtsL', values='Count')
# This is devestatingly ugly code.
populate_heatmap = lambda x, y: pivoted_data[x][y] if x in pivoted_data and y in pivoted_data[x] else 0
heatmap_data = pd.DataFrame([[populate_heatmap(x, y) for y in range(73)] for x in range(73)])
plt.pcolor(heatmap_data, cmap=plt.cm.Blues, alpha=0.8, vmin=0, vmax=50)
Some points here:
- There’s gotta be an easier way to fill in the dataframe than what I did, but I was feeling lazy: I’d imagine something using
np.zeroes
would do the trick. - If, like me, you are not readily equipped with encyclopedic knowledge of the
pcolor()
method which does the vast majority of the work for me: you specify the color palette viacmap
, and the range of values it maps viavmin
andvmax
. (You’d notice that I top off at 50, which means that a final score with 52 occurrences will appear the same as one with 520.)
The result:
Some immediate observations:
- matplotlib is incredibly ugly.
- There are some pretty cool patterns, namely the absence of certain scores like 8, 11, and 18 – which would involve some really strange play-calling.
- We get a rather neat triangular formation from the simple reality that losing team never scores more points than the winning team (and thus there are no points above the
y = x
axis).
We can try to recreate it in Google Charts to make it a little prettier:
Anyway, that’s all I have – if you found it interesting, either from a programming or a football perspective, please share it! I’ve uploaded the entire script (warts and all) to GitHub so feel free to play around with it. If you have any questions or ideas for the data, definitely let me know either via email or in the comments.