This study leverages high-quality data from soccer matches to derive a better understanding of the elements contributing to a team's success. Initially, we classically analyzed extensive soccer logs from major European leagues and international tournaments. We have defined a team's technical performance through a vector of features, including goalkeeping, intercepts, tackles, dribbles, and more. The objective of the paper is to classify the state of a match with the labels of win, defeat, and draw. We have not made predictions about any future outcomes, but did focus on understanding the characteristics of the data itself to identify patterns and trends of the match. In doing so, we guessed the match result only after having collected the above feature data for the entire match duration. Thus, our scenario poses a classification problem. We compare different models (SVM, Logit, XGB, and MLP), the last one outperforming the others. Moreover, as a brand-new approach, we analyzed the logs by considering matches of different increasing duration. In particular, the lengths of the matches were the terms of an arithmetic series with a common difference of 5 minutes. In doing so, we have provided a dynamic approach that labels the match outcome every 5 minutes, using an MLP to track the accuracy of the state over the time. The findings have revealed an improved detection of draws, and highlighted that the model accuracy is higher in the early stages but decreases as the match progresses. In both approaches, explainable AI techniques have identified the key predictive features, offering insights into how technical features influence success dynamically throughout a match.
Identifying the elements of a soccer match’s performance with machine learning
Pappalardo Luca;
2026
Abstract
This study leverages high-quality data from soccer matches to derive a better understanding of the elements contributing to a team's success. Initially, we classically analyzed extensive soccer logs from major European leagues and international tournaments. We have defined a team's technical performance through a vector of features, including goalkeeping, intercepts, tackles, dribbles, and more. The objective of the paper is to classify the state of a match with the labels of win, defeat, and draw. We have not made predictions about any future outcomes, but did focus on understanding the characteristics of the data itself to identify patterns and trends of the match. In doing so, we guessed the match result only after having collected the above feature data for the entire match duration. Thus, our scenario poses a classification problem. We compare different models (SVM, Logit, XGB, and MLP), the last one outperforming the others. Moreover, as a brand-new approach, we analyzed the logs by considering matches of different increasing duration. In particular, the lengths of the matches were the terms of an arithmetic series with a common difference of 5 minutes. In doing so, we have provided a dynamic approach that labels the match outcome every 5 minutes, using an MLP to track the accuracy of the state over the time. The findings have revealed an improved detection of draws, and highlighted that the model accuracy is higher in the early stages but decreases as the match progresses. In both approaches, explainable AI techniques have identified the key predictive features, offering insights into how technical features influence success dynamically throughout a match.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


