# Analysing Pitchfork using Pandas

I spend a likely-unhealthy amount of time on Pitchfork, its where I get my music news and I can usually rely on their reviews to decide whether or not an album is worth a listen. Still, they often come under fire as being — amongst other things — self-serious and overly critical: allegations have been made that their albums are graded on a too-harsh scale, with their reviews being motivated by commercial reasons as much as musical ones.

So, naturally, I downloaded all of them.

I decided to load the thing into Python (using the wonderful pandas library) and poke around.

```
import pandas as pd
DATE_INDEX = -2
review_data = pd.read_csv('./pitchfork_review_data.csv’, parse_dates=[DATE_INDEX])
```

The immediate curiosity for me was that of score distribution: Pitchfork grades on a 0.0 — 10.0 scale, so one would expect that the average is 5.0, right?

Well, let’s take a look:

```
review_data[‘score’].describe()
```

score | |
---|---|

count | 14919.000000 |

mean | 6.969562 |

std | 1.356199 |

min | 0.000000 |

25% | 6.400000 |

50% | 7.200000 |

75% | 7.800000 |

max | 10.000000 |

**Out of all 14900 reviews, the average is 6.97 — talk about grading on a curve.** Additionally, half of all reviews fall between a 6.4 and a 7.8 — a pretty significant window considering the general sense of outrage given to reviews that throw out scores less than a 5.0 and the general ‘king making’ power of a Best New Music accolade (generally given to artists that score an 8.2 or higher).

Actually, speaking of Best New Music, let’s take a look at that.

```
review_data[review_data.accolade == ' Best New Music '].describe()
```

score | |
---|---|

count | 500.000000 |

mean | 8.619400 |

std | 0.328602 |

min | 7.800000 |

25% | 8.400000 |

50% | 8.500000 |

75% | 8.800000 |

max | 10.000000 |

```
''' head() will give us the five lowest scoring reviews '''
review_data[review_data.accolade == ' Best New Music '].sort('score').head()
```

''' tail() will give us the five highest '''
review_data[review_data.accolade == ' Best New Music '].sort('score’).tail()

Looks like the lowest score given to a BNM is 7.8 (given to !!!’s Me and Giuliani Down by the Schoolyard, a groan-inducing name if I’ve heard of one. Conversely, the three highest scores handed down to new music are 9.6, 9.7, and a controversial 10.0 to *The Fiery Furnaces*, *Arcade Fire*, and *Kanye West* respectively.

Back to the overall score distribution, though, percentile data only gives us one perspective at the data. Graphing the rounded scores yields some interesting results:

```
import matplotlib as plt
import matplotlib.pyplot as pyplt
pyplt.hist(review_data['score'])
pyplt.show()
```

As expected, there’s a clustering of reviews in the 6-8 range, with a long tail approaching 0 and a steep drop off to 10. But if we increase the granularity:

```
pyplt.hist(review_data['score’], bins=20)
pyplt.show()
```

```
pyplt.hist(review_data['score’], bins=50)
pyplt.show()
```

We get a much more interesting perspective. In particular, Pitchfork loves their 7.5s and 8.2s. Also revealing is the relative frequency of perfect scores: mainly reserved for Beatles and jazz reissues, one can imagine the backlash if a reviewer deems *Kind of Blue* less than perfect.

--—

Another charge often levied at *Pitchfork* is that their standards have diminished as they’ve gained a larger viewership. We can try simply plotting the reviews against their publish date, but it’s not much help:

```
daily_data = review_data.groupby("publish_date")['score'].mean()
daily_data.plot()
pyplt.show()
```

There’s too much noise to get a good impression of any overall trends: while it looks like things tend to oscillate around the 7.0 mark, we can try plotting the mean review score of each month to get a clearer picture.

```
monthly_data = daily_data.resample('M', how='mean')
monthly_data.plot()
pyplt.show()
```

Quite a bit clearer: we can attribute the early flux to the fact that Pitchfork’s first few years, they were only publishing one or two reviews per week as opposed to five a day. It looks like averages were relatively steady, with a slight dip from 2007 — 2010, but we can run a regression to make sure:

```
monthly_data.plot()
monthly_frame = monthly_data.reset_index()
total_points = len(monthly_data)
model = pd.ols(y=monthly_frame[0], x=pd.DataFrame(range(0, total_points)), intercept=True)
```

Wow: with a RMSE of .6757 (not great, but not awful), we get a line with an intercept of **6.977** and slope of **.000037** — as in, barely any change at all.

--—

Lastly, let’s take a look at the reviewers themselves: it’s not exactly out of the realm of possibility that certain critics are sticklers and others are more generous (I mean, anyone who gave *Merriweather Post Pavilion* a 9.6 can’t have the highest standards, right?)

```
reviewer_data = review_data.groupby('reviewer')['score']
aggregated_reviewers = reviewer_data.mean()
aggregated_reviewers.sort(‘mean’)
```

Skipping over group reviews, the strongest authors at either extreme:

reviewer | average score |
---|---|

Bob O. McMillan | 3.5 |

Alan Smithee | 4.0 |

Adam Ohler | 4.2 |

… | … |

Carl Wilson | 8.5 |

Philip Welsh | 8.6 |

Drew Daniel | 8.6 |

That’s all I’ve got for now — I hope you found it interesting, either from a programming perspective or a musical one! Feel free to download the csv and play around with it yourself — if there are any questions you’d like me to answer (or suggestions for further analysis), please let me know via email or comment.

**You should follow me on Twitter.**

**Liked this? Put in your email for weekly updates from yours truly:**

Tweet

## Comments