Goto

Collaborating Authors

 Stando, Adrian


Glocal Explanations of Expected Goal Models in Soccer

arXiv.org Machine Learning

In soccer, it is not uncommon for one team to dominate a match, creating many chances to score but failing to do so, while the opposing team manages to convert one of their few chances into a goal and win the match. Thus, the use of traditional end-of-match statistics is often argued against, because the number of shots, ball possession percentage, and shots inside the opponent's penalty area do not always accurately reflect the outcome of the match. The rapid pace of technological advancements in data collection, storage, and analysis have had a revolutionary impact on soccer analytics over the last decade. Thanks to these advancements, soccer data is collected in two main forms: event data consists of ball-related events and where on the field they occurred such as shots, passes, tackles, and dribbles while tracking data consists of the position of players and the ball throughout play on the pitch. The technological revolution has made it possible to propose a large number of key performance indicators to measure different aspects of the game, such as pass evaluation, quantification of controlled space, shot evaluation, and goal-scoring opportunities using possession values.


The Effect of Balancing Methods on Model Behavior in Imbalanced Classification Problems

arXiv.org Artificial Intelligence

Imbalanced data poses a significant challenge in classification as model performance is affected by insufficient learning from minority classes. Balancing methods are often used to address this problem. However, such techniques can lead to problems such as overfitting or loss of information. This study addresses a more challenging aspect of balancing methods - their impact on model behavior. To capture these changes, Explainable Artificial Intelligence tools are used to compare models trained on datasets before and after balancing. In addition to the variable importance method, this study uses the partial dependence profile and accumulated local effects techniques. Real and simulated datasets are tested, and an open-source Python package edgaro is developed to facilitate this analysis. The results obtained show significant changes in model behavior due to balancing methods, which can lead to biased models toward a balanced distribution. These findings confirm that balancing analysis should go beyond model performance comparisons to achieve higher reliability of machine learning models. Therefore, we propose a new method performance gain plot for informed data balancing strategy to make an optimal selection of balancing method by analyzing the measure of change in model behavior versus performance gain.