Goto

Collaborating Authors

 Schauberger, Gunther


Modeling and Prediction of the UEFA EURO 2024 via Combined Statistical Learning Approaches

arXiv.org Artificial Intelligence

In this work, three fundamentally different machine learning models are combined to create a new, joint model for forecasting the UEFA EURO 2024. Therefore, a generalized linear model, a random forest model, and a extreme gradient boosting model are used to predict the number of goals a team scores in a match. The three models are trained on the match results of the UEFA EUROs 2004-2020, with additional covariates characterizing the teams for each tournament as well as three enhanced variables derived from different ranking methods for football teams. The first enhanced variable is based on historic match data from national teams, the second is based on the bookmakers' tournament winning odds of all participating teams, and the third is based on historic match data of individual players both for club and international matches, resulting in player ratings. Then, based on current covariate information of the participating teams, the final trained model is used to predict the UEFA EURO 2024. For this purpose, the tournament is simulated 100.000 times, based on the estimated expected number of goals for all possible matches, from which probabilities across the different tournament stages are derived. Our combined model identifies France as the clear favourite with a winning probability of 19.2%, followed by England (16.7%) and host Germany (13.7%).


Deducing neighborhoods of classes from a fitted model

arXiv.org Machine Learning

In todays world the request for very complex models for huge data sets is rising steadily. The problem with these models is that by raising the complexity of the models, it gets much harder to interpret them. The growing field of \emph{interpretable machine learning} tries to make up for the lack of interpretability in these complex (or even blackbox-)models by using specific techniques that can help to understand those models better. In this article a new kind of interpretable machine learning method is presented, which can help to understand the partitioning of the feature space into predicted classes in a classification model using quantile shifts. To illustrate in which situations this quantile shift method (QSM) could become beneficial, it is applied to a theoretical medical example and a real data example. Basically, real data points (or specific points of interest) are used and the changes of the prediction after slightly raising or decreasing specific features are observed. By comparing the predictions before and after the manipulations, under certain conditions the observed changes in the predictions can be interpreted as neighborhoods of the classes with regard to the manipulated features. Chordgraphs are used to visualize the observed changes.


Hybrid Machine Learning Forecasts for the FIFA Women's World Cup 2019

arXiv.org Machine Learning

In this work, we combine two different ranking methods together with several other predictors in a joint random forest approach for the scores of soccer matches. The first ranking method is based on the bookmaker consensus, the second ranking method estimates adequate ability parameters that reflect the current strength of the teams best. The proposed combined approach is then applied to the data from the two previous FIFA Women's World Cups 2011 and 2015. Finally, based on the resulting estimates, the FIFA Women's World Cup 2019 is simulated repeatedly and winning probabilities are obtained for all teams. The model clearly favors the defending champion USA before the host France.