Select Page

Continuing my analysis of the MCFC Analytics project data for the 2012 EPL season, in this post I try to give an answer to a much debated argument among fans, namely:  does the number of points gained by a team at the end of the season, and therefore its position on the final points table, reflect accurately its performance on the pitch?

The data

My first task was to select appropriate parameters to measure a team’s performance (metrics).   As the idea of this analysis was prompted by reading  a post by the always excellent Dan Barnett,  I decided to use his table of performance parameters (Data) referenced in his blog.  But not all of these –  I left some out reasoning, perhaps wrongly, that were not fit for my purpose .  My final selection of metrics is shown in the table below (Fig.1).

Team dataThe method of analysis

I had already decided that Cluster Analysis was the method that I would use to give an answer to the vexed question.  And I think many data analyst would agree with me that this is the most appropriate.

There are many ways to do Cluster Analysis and, perhaps, I should go into some detail of what I did.  But I won’t…  for a number of reasons.  I shall only say that I used a few just to make sure of my results.  I am willing, however, to discuss my choice with those who will take the trouble to ask.   So, let’s get straight to the analysis.

A heat map representation of the data table above is shown in Fig.2.  Values have been normalised by subtracting from the average of each column.  High values are shown in various shades of red, those near the average in black, and negative ones in shades of green.

So one can see that good performance values for the top teams (goals, shots, etc.) are in bright red (hot), while those of the bottom ones are mainly in shades of green.  The reverse is for true bad performance indicators, such as goals/shots/etc. conceded.

Fig. 2
Colour scale
Color scale2012 table heatmapThe analysis results

Fig. 3 show the results of the analysis. From left to right we have:

  1. The new ranking of the teams with respect to the performance parameters used
  2. A heat map that shows the normalized values of these parameters
  3. A dendogram that shows how teams have been clustered


Team cluster HMapThe first thing we notice is that this performance ranking order is different from the final points rank: some teams have been promoted and others  demoted – which was expected, of course. We can also notice that the first four teams and the bottom three retain their points table position.   So, the points system is clearly a fair method of classification.  These positions, in fact, dictate which teams are rewarded to play in the main European competitions, and which are punished with  relegation.

We can also notice the following:

  1.  Man City and Man Utd have very similar performance, belong to the same, unique cluster and have the same rank as the in the final EPL league table, 1st  and 2nd.
  2. They are followed by a cluster of four teams: Arsenal, Spurs, Chelsea and Liverpool.  And here we see the first major changes from the points ranking, as Chelsea and Liverpool have taken the positions of Newcastle and Everton respectively, dropped to the next, less performing cluster
  3. For Newcastle, this downgrading confirms other analysis and comments in various posts, not least that, already mentioned, of Dan Barnett.  Pointwise, Newcastle over performed – the Toons performance on the pitch does not tally with their final league position.
  4. Stoke and Aston Villa have very similar performance and make up a separate cluster, with the latter also being the biggest gainer, +4 positions in this analysis.
  5. In the next cluster we find WBA, down four positions from its points table rank.
  6. Four teams are in the last cluster, that with the worst performers.  This includes all the relegated teams, but also Norwich, who makes the biggest drop of all (-5 positions) , from  twelve to seventeen.

A summary comparison of these two classification, points (Table Rnk) and performance (Perf. rnk), is given in the table below  (Fig. 4):


Team rank
Concluding remarks

As expected, the points table does not reflect accurately the performance of ALL the teams during the season.  It appears, though, to reflect the performance of the best and worst performing teams (as measured by the metrics considered in our analysis).  And since the points table rewards and punishes  only these teams, it can be said that its results are fair.   Some managers, however, especially those who have been criticised and even sacked for ‘bad performance’, may feel vindicated by our analysis.  Others, of course, might question our results.

Cluster analysis is probably the only analytical method that can give the best answer to the question posed at the start.  In contrast with other methods of analysis that can classify teams with respect to a single metric, it can do so with respect to many, which is what is needed to answer the question correctly.   I believe this is the first time that this method  has been used for such purpose.  And I am looking forward to the end of the current season to apply it to this year’s results.

I guess that many would not be entirely happy with the metrics I have used.  As I said, I took the lazy option and used those provided by someone else, perhaps more knowledgeable analyst.  I would consider, however, performing further analysis by adding/subtracting any suggested metric (with a reasoned argument for) as long as I am provided with the relevant data.