clock menu more-arrow no yes

Filed under:

Soccer Math and You: The Introduction of Player Templates

Hello all you boys and girls. I’d like to take you to the soccer math world.

ACF Fiorentina v AC Milan - Serie A Photo by Gabriele Maltinti/Getty Images

If you get the reference you win one gold star.

During the last month I have been compiling data and making player charts with data from Wyscout. These charts display data that I deem relevant for each player category. For example, I chose to look at goals saved above average for goalies instead of save percentage because I want to understand how many expected goals are being saved. Today’s article will introduce you to the data, what they measure and why they are important. The individual players I picked for this article are random.

These stats are from play time during the 2019/20 soccer season and show how the players in the top five European Leagues (France, Italy, England, Spain, and Germany) compare against each other. To qualify a player must have played for 600 minutes this season. I use percentiles because they compare players to other players easily and do not require me to tell you how raw numbers individually rank, I can just show you. Percentiles rank a player within a data set. For example, with data like expected goals per 90 the output is in decimal form so the difference between 0.4 and 0.7 may seem small but it is actually fairly sizable, the 96th percentile to the 66th percentile (or 10th best to 81st best out of 240 center forwards). So that attacker with a 0.7 expected goal per 90 is in the 96th percentile, which means he is better than 95.59% of all center forwards in that individual category. The raw number would not give you that level of comparison.

The first template is goaltenders, and my heavens wouldn’t you know, it’s mean Dean Henderson from Sheffield United F.C. (I am sure he is not mean and is actually a swell lad, but like rhyming is fun right). So here we go

Goalies:

I chose xG against as the first stat so we know how many expected goals Henderson has faced. This does not mean that Henderson has faced 27.83 xG against (he has actually faced a little more than 29 xG against). It means xG Against that he has faced is in the 27.83 percentile compared to other goalies. Goals saved above average is the measurement of how many goals has a goalie saved or allowed against the amount of expected goals he has faced. Dean Henderson 29.38 xG against and has only allowed 22 goals. So he has saved 7.38 goals above average which places Henderson in the 95.65 percentile. Average in this stat would mean that a goalie has allowed the same amount of actual goals to expected goals. With Henderson as the example, if he allowed 29 actual goals then he would be considered an average goalie. He would have allowed what is expected of him. This is the most important stat for goalies because it says how many goals a player is saving his team per season. This is the true purpose of a goalie so being better at this stat is vital.

The last four stats are more contextual for who the player is or what he faces. I use the next four stats to speak about what the individual shot he faces is like, how good he is at claiming crosses, how well he passes, and how often he removes the ball from his own box. xG/Shot against tells us how dangerous the individual shots a goalie faces are. Henderson does not face particularly dangerous shots, but he still saves them and at a fantastic rate. Claims per 90 is how often he collects the ball in the air. Henderson does not pass well at all, and this is slightly worrying, but I do not like tactical plans that require the goalie to be a part of build up play. Exits per 90 is how often he clears the ball from his box per 90 minutes. That is a wrap on goalies, on to center backs.

Center Backs:

The template for center backs has the caveat that their percentiles are dependent on the system that they play in. Here I have highlighted two different center backs in two different systems. The Atlético Madrid center backs are not as involved as the center backs in the TSG Hoffenheim system. So, Stefan Savić does not look amazing in this chart because he is required, by his system, to be more conservative and be less active in all other parts of play. The midfield for Atlético does a lot of the dueling and the centerbacks mop up more of the issues that get past their line of midfield four. Savić does not need to take part in all facets of the game, but this does not mean he is bad. On the other hand, Stefan Posch is involved in all facets of play and rates highly on defensive duels per 90. His activity within the game is because his coach, Alfred Schreuder, asks the center backs to play an integral role in build up, but they also can be exposed to counter attacks. What I am trying to say is that judging center backs only on these templates has a flaw. Understanding how these players exist within their teams is important and these templates can help show where these players may excel..

Duels here are any form of challenge on the ball whether that be an attempted interception, challenge, or defensive header. Aerial duel is the same idea, just in the air. Tackles are sliding tackles exclusively, not standing tackles. The other two important definitions are deep completed passes and progressive passes. Deep completed passes are passes deep into the opposition’s territory, think the last twenty yards or the box to the goalline. Progressive passes are any pass that covers more than twenty yards in a positive direction. Lateral passes need not apply here. Center backs are tricky to understand and while these templates shed some light. These charts are imperfect, however, they are slightly better than nothing.

Full Backs:

So, this is the same exact issue as center backs but the attacking numbers actually matter. The reason that the attacking numbers matter more now is because of the deployment of fullbacks in today’s game. Trent Alexander-Arnold is the best attacking full back on the planet, so that is cool. Trent rates very low on passing accuracy (19.44 percentile, he completes 73.58% of his passes), but no one provides more expected assists as a fullback. The defensive numbers, like with centerbacks, still have faults because systems still matter a lot, but Trent is not particularly good at the old defense. He is a full-blown attacking full back, unlike a player like Aaron Wan-Bissaka who is more defensive full back or Ricardo Pereira who does about everything well.

The only new stat here is progressive runs, which like progressive passes have to do with distance covered with the ball at your feet. A Progressive run is a run that covers thirty yards if it starts an end within your defensive half. A run can be progressive if it is ten yards and starts and ends in the opposition half. Dribbles are anytime a player runs at an opposing defender and attempts to beat that player during a take on. Dribbles are cool and lead to flashy highlight packages but are not crucial. However, I like fun and dribbles are fun.

Central Midfielders:

I am going to start this off by saying that Marco Veratti is the best midfielder on the planet. The little man does everything, and as an Azzurri fan I appreciate him so much. While Verratti is not in the 100th percentile in everything, he is near the best in about every category. Verratti is in the mold of a guy who can play as a true box to box midfielder. Players in this mold work in a midfield two and a three. So, if a team plays a 4-4-2 or a 4-3-3, Verratti would be comfortable in both. Tactically, I would try to stay away from him if Milan ever were to face PSG in a game. He is just simply that good.

For math stuff, Fwd passes completed denotes how many passes he completes that go in a positive direction. I generally do not care as much about lateral passes (they are still important but less important than forward passes) like I have said. Cmplt pass to penalty area means completed passes to the penalty area per ninety minutes. This tracks how many successful passes you have to the box per match. The rest we have generally covered.

Central Attacking Midfielders:

The boy wonder, Kai Havertz. So, Kai had a down year at Bayer Leverkusen, which is unfortunate because I highly rate this player. These are still solid results but none of them necessarily pop off the template. Kai has done better in 2019/20 with his expected assist numbers than 2018/19, but the expected goals numbers took a sizable hit, compared to 2018/19. However, I continue to be impressed with him and think he would function well at whatever team he goes to next. In my fantasy world he stays at Bayer Leverkusen and they win the league, but I do not make the rules.

Math time. Attacking midfielders and Wingers are compared against each other because the two positions do similar things. xG means expected goals, Non-penalty goals is how many goals the player scores that are not penalties. xG/shot is how dangerous his individual shots are. Shots become more dangerous the closer you get to the box. Def Duels is now important to talk about again. With center backs, full backs, and midfielders, the success of a duel matters, however, with attacking players just entering duels is important. When an attacking midfielder, winger, or center forward enters a duel it is about the team’s press. Their success in the duel is less important to me so I do not look at how many successful duels they have, just how many they attempt.

Wingers:

I chose Adama Traoré to show you how I would describe a player based on his template. Adama Traoré is not my favorite player. His expected goals are low, he does not score much, but he crosses and dribbles well. As a winger, you would expect more involvement within the box. Traoré does not shoot often, when he shoots it is dangerous, but he generates very few expected goals. Traoré creates a good number of expected assists, but he is wasteful in dangerous areas. When he is dangerous in attacks he is most likely crossing. Traoré dribbles a lot, but that is not particularly crucial to me, because passes break lines of defense faster and progress play more quickly. Lastly, he does not duel a lot. Some of this is the Wolves system that does not press aggressively, but he is low even among the attackers on that team. These numbers show to me a player who should play as a wing back on a team finishing mid-table in the top five leagues. I would not, however, say that he should get praise for being one of the best players in a major league. He simply is not.

Center Forwards:

Karim Benzema gets a bad rap, but he is legitimately good and is asked to do a lot in Zinedine Zinédine’s new system. He is still a premier striker, and much better than Olivier Giroud (this is directed at Didier Deschamps). The involvement is fantastic, and he still takes dangerous shots. He does not press a lot, but he is thirty-two, he can take a breather. Benzema links play well and is a quality passer into the penalty box. He reinvented his game substantially this season and I applaud that.

We have one new number here, touches in box per 90, which tracks how many touches a center forward has in the box. This template screams attacking involvement in the box, and I care about this a lot. If a forward is getting high touches numbers in the box that shows that he is playing in dangerous areas. Playing in dangerous areas typically correlates to more dangerous shots or chances, so thus the usage of the touches in the box metric. Aerial duels, unlike with center backs, can be weighted with how often a player is attacking during set pieces. Benzema, to say the least, is not an aerial threat.

Teams:

Teams get templates too!

I picked Getafe because they press like crazy people. PPDA measures passes per defensive action. This is how many passes a team allows before they enter a duel. Essentially, they allow the least number of passes before a duel within the top five leagues. They press like crazy. The duel does not need to be successful, it just needs to happen. xG differential is a measure of how many expected goals a team wins by per 90 minutes. This statistic can show how much teams are winning individual games. The higher they rank on the percentile, the more convincingly the team wins each individual game. This can be seen with Getafe because they do not create a lot of xG for but limit a lot of xG against (the high number is good here). When Getafe shoots, it is very dangerous, but when they have shots against, which is rare, it is also very dangerous (low number is bad here).

These are all of the templates I will be using, and I hope this is a good primer. I will use many of these extensively and I will link back to this article often (especially early on). I have two more years worth of data that I use when rating players. Adding this data into these charts becomes difficult if a player plays in a top 5 league in 2017/18, then leaves in 2018/19, and then returns in 2019/20. Accurately weighing this is tricky and may not be entirely useful. Kai Havertz is a perfect example of this because he was fantastic in 2018/19. He was in the 91st in xG per 90 in 2018/19 (compared to the 54th in 19/20). I will add this kind of context for the players I use so that you all are not left in the lurch (not left in the dust).

I hope everyone is healthy and safe. I have said this to some friends, on twitter (follow me here), and to the writers, but if you want me to do deep dives on teams, tactics, or coaches, I can. Right now, I have a lot of free time and I like doing research (I am a history major). I will be doing some big AC Milan stuff in my next article so get excited. Maybe not that excited, but like pretty excited. Actually, get incredibly excited, I am.

As you may have noticed, there are no Serie A players here. This will change soon enough...