I spend a likely-unhealthy amount of time on Pitchfork, its where I get my music news and I can usually rely on their reviews to decide whether or not an album is worth a listen. Still, they often come under fire as being — amongst other things — self-serious and overly critical: allegations have been made that their albums are graded on a too-harsh scale, with their reviews being motivated by commercial reasons as much as musical ones.

So, naturally, I downloaded all of them.

I decided to load the thing into Python (using the wonderful pandas library) and poke around.

import pandas as pd
review_data = pd.read_csv('./pitchfork_review_data.csv’, parse_dates=[DATE_INDEX])

The immediate curiosity for me was that of score distribution: Pitchfork grades on a 0.0 — 10.0 scale, so one would expect that the average is 5.0, right?

Well, let’s take a look:

count 14919.000000
mean 6.969562
std 1.356199
min 0.000000
25% 6.400000
50% 7.200000
75% 7.800000
max 10.000000

Out of all 14900 reviews, the average is 6.97 — talk about grading on a curve. Additionally, half of all reviews fall between a 6.4 and a 7.8 — a pretty significant window considering the general sense of outrage given to reviews that throw out scores less than a 5.0 and the general ‘king making’ power of a Best New Music accolade (generally given to artists that score an 8.2 or higher).

Actually, speaking of Best New Music, let’s take a look at that.

review_data[review_data.accolade == ' Best New Music '].describe()
count 500.000000
mean 8.619400
std 0.328602
min 7.800000
25% 8.400000
50% 8.500000
75% 8.800000
max 10.000000
''' head() will give us the five lowest scoring reviews '''
review_data[review_data.accolade == ' Best New Music '].sort('score').head()    
''' tail() will give us the five highest '''
review_data[review_data.accolade == ' Best New Music '].sort('score’).tail()

Looks like the lowest score given to a BNM is 7.8 (given to !!!’s Me and Giuliani Down by the Schoolyard, a groan-inducing name if I’ve heard of one. Conversely, the three highest scores handed down to new music are 9.6, 9.7, and a controversial 10.0 to The Fiery Furnaces, Arcade Fire, and Kanye West respectively.

Back to the overall score distribution, though, percentile data only gives us one perspective at the data. Graphing the rounded scores yields some interesting results:

import matplotlib as plt
import matplotlib.pyplot as pyplt

As expected, there’s a clustering of reviews in the 6-8 range, with a long tail approaching 0 and a steep drop off to 10. But if we increase the granularity:

pyplt.hist(review_data['score’], bins=20)

pyplt.hist(review_data['score’], bins=50)

We get a much more interesting perspective. In particular, Pitchfork loves their 7.5s and 8.2s. Also revealing is the relative frequency of perfect scores: mainly reserved for Beatles and jazz reissues, one can imagine the backlash if a reviewer deems Kind of Blue less than perfect.


Another charge often levied at Pitchfork is that their standards have diminished as they’ve gained a larger viewership. We can try simply plotting the reviews against their publish date, but it’s not much help:

daily_data = review_data.groupby("publish_date")['score'].mean()

There’s too much noise to get a good impression of any overall trends: while it looks like things tend to oscillate around the 7.0 mark, we can try plotting the mean review score of each month to get a clearer picture.

monthly_data = daily_data.resample('M', how='mean')

Quite a bit clearer: we can attribute the early flux to the fact that Pitchfork’s first few years, they were only publishing one or two reviews per week as opposed to five a day. It looks like averages were relatively steady, with a slight dip from 2007 — 2010, but we can run a regression to make sure:

monthly_frame = monthly_data.reset_index()
total_points = len(monthly_data)
model = pd.ols(y=monthly_frame[0], x=pd.DataFrame(range(0, total_points)), intercept=True)

Wow: with a RMSE of .6757 (not great, but not awful), we get a line with an intercept of 6.977 and slope of .000037 — as in, barely any change at all.


Lastly, let’s take a look at the reviewers themselves: it’s not exactly out of the realm of possibility that certain critics are sticklers and others are more generous (I mean, anyone who gave Merriweather Post Pavilion a 9.6 can’t have the highest standards, right?)

reviewer_data = review_data.groupby('reviewer')['score']
aggregated_reviewers = reviewer_data.mean()

Skipping over group reviews, the strongest authors at either extreme:

reviewer average score
Bob O. McMillan 3.5
Alan Smithee 4.0
Adam Ohler 4.2
Carl Wilson 8.5
Philip Welsh 8.6
Drew Daniel 8.6

That’s all I’ve got for now — I hope you found it interesting, either from a programming perspective or a musical one! Feel free to download the csv and play around with it yourself — if there are any questions you’d like me to answer (or suggestions for further analysis), please let me know via email or comment.

Liked this post? Follow me!