More Damned Statistics

Studies have shown that accurate numbers aren’t any more useful that the ones you make up!

stats

For a while as a young boy I collected London bus numbers; not route numbers but the fleet number that was painted next to the driver’s cab. It was the budget version of train spotting because you didn’t need to buy a platform ticket. I would carefully write the numbers down in a small notebook but I didn’t stick at it very long before realising the whole exercise was a complete waste of time.

Nowadays any millennial geek fascinated by collecting and recording pointless information can ‘monetise’ their proclivity through gainful employment with an organisation such as Opta, the sport’s data specialists. Football, like most sports, is now awash with data that provides a minute by minute analysis of every action and incident so that at any time we can know how far Mark Noble has run today. My assertion, however, is that while the resulting statistics might be interesting they are nothing more and there is no cause and effect between the data presented and the actual outcome of a game i.e. that the stats are basically meaningless. I have written about this previously and undertook to keep a watchful eye as the season progresses to see if I could be proved wrong.

For the purposes of my study I am using the data presented on the Whoscored website, which despite my scepticism over the usefulness of the stats is an excellent resource. The Whoscored data is, I understand, sourced from Opta and fed real-time to a large number of media companies . For each game, the website provides a match report showing summary details for possession, passes completed, shots on goal, aerial duels won, tackles made and dribbles won. I am making an assumption here that having selected these categories the folks at Whoscored consider them to be the most pertinent to the outcome of a game.

Of the 30 Premier League matches played to date there have been 22 which have had a positive outcome (with 8 drawn games). Of these, the winning team had the advantage in possession, passes completed, shots and dribbles won while the losing side more often came out on top for aerial duels won and tackles made. In only 1 of 30 games (Burnley v Swansea) did the winning side dominate every category while there was also 1 game (Palace v WBA) where the losing side was on top across the board.

So are there any conclusions that we can make? Should managers tell their players that losing aerial duels and tackles is the best way to win the game? Or is it obvious that more shots on goal increase the chance of winning? Or that if you are forced to defend it is likely that you will need to make more tackles?

There was a school of thought last year that conceding possession bore some relation to winning the game; probably because it was a prevalent feature of Leicester’s season (and our own to some extent). This has not been reflected in the games so far this season although I am still not convinced as to how possession is actually measured; the only time I have seen it explained (a few years back) it was suggested that possession is, in fact, derived from passes completed. That in all 30 games the team with most possession also completed most passes may confirm this.

Maybe the only purpose for the stats is the fun of collecting them in a similar vein to the bus numbers and I am over-thinking them.  But I don’t believe that is how they are used by TV producers and pundits who present them as if they define the game. For now it remains case unproven as far as I am concerned but I will keep on tracking developments.

Lies, Damn Lies and Football Statistics

Football statistics, what do they mean and how do West Ham fare?

One of the growth industries of modern football is the statistic and every game seemingly now has an army of people studying play on computer monitors so that every pass, tackle and duel can be recorded and fed into a database for subsequent analysis and debate.

Playfair
Playfair Football Annual

As a young boy I was regularly given Playfair Cricket and Football Annuals as a present which became required bedtime reading to the accompaniment of Radio Luxembourg where Horace Batchelor urged listeners to subscribe to his patented method of winning the football pools.

While the Cricket Annual was packed with player stats of runs scored, wickets taken, catches, stumpings and averages the most that you got for football players of the time was appearances and goals scored. Even Horace Batchelor had no inside statistical knowledge to support his“Famous Infra Draw Method” and his approach was to pool resources and create a huge permutation to improve the chances of picking out the drawn games.

Fast forward to today and the internet is awash with football stats and there are companies and websites that are completely devoted to their collection and analysis. The range of stats now includes number of shots, passes, tackles, fouls, aerial duels, short passes, long passes, dribbles, interceptions and distance run.

The problem that I have is that while these stats may be interesting is their any causal relationship between the information collected and the outcome of the match? Looking at the cricket stats I think it is clear that scoring runs and taking wickets are quite fundamental to winning a game but how important is, say, aerial duels won to the outcome of a football match?

The Whoscored website is a great resource for the stats aficionado and they live by their claim to be “Revolutionising Football Statistics”. So it was interesting to look at how Leicester had fared last season from a stats perspective as they ran out comfortable Premier League champions by 10 points.

The stand-out for me from Leicester’s season is that they were ranked 18th for Possession and 19th for Pass Success Rate (we should not be surprised that these two metrics are closely correlated because I have read that Opta use Pass Success Rate as a proxy for Possession – they don’t actually record who is any possession at any one time!). Where Leicester did well was for Interceptions, number of Tackles and Aerial Duels won. For Aerial Duels they were just behind Aston Villa – so we can see that it didn’t do them much good.

From all of these stats, Whoscored derive an overall rating (although I couldn’t find any details as to how this is calculated). The top 6 clubs based on the rating (in order) were Arsenal, Leicester, Tottenham, Manchester City, West Ham and Southampton with Manchester United in distant 10th place. So I guess you could say there is some correlation if the rating is directly related to the attributes measured.

For the TV viewer it is Possession that is the most frequently presented statistic and this seems odd when, at least based on last season, it bears no relation to the probable outcome. It may give the disgruntled losing manager something to hid behind yet the only true meaningful statistic is goals scored.

WH-Stats
WhoScored.com

The statistical summary of West Ham’s last season also shows that we were one of the poorer teams as far as Possession and Pass Success (12th and 13th respectively) are concerned. We performed quite well for Total Shots and Shots On Target but our main claim to fame was being one of the most Fouled sides in the league.

So that was last season and for this we start with a fresh notebook and pencil and will provide regular updates on how the wonderful world of statistics is affecting West Ham’s season.