Ley, Christophe
ELBOing Stein: Variational Bayes with Stein Mixture Inference
Rønning, Ola, Nalisnick, Eric, Ley, Christophe, Smyth, Padhraic, Hamelryck, Thomas
Stein variational gradient descent (SVGD) [Liu and Wang, 2016] performs approximate Bayesian inference by representing the posterior with a set of particles. However, SVGD suffers from variance collapse, i.e. poor predictions due to underestimating uncertainty [Ba et al., 2021], even for moderately-dimensional models such as small Bayesian neural networks (BNNs). To address this issue, we generalize SVGD by letting each particle parameterize a component distribution in a mixture model. Our method, Stein Mixture Inference (SMI), optimizes a lower bound to the evidence (ELBO) and introduces user-specified guides parameterized by particles. SMI extends the Nonlinear SVGD framework [Wang and Liu, 2019] to the case of variational Bayes. SMI effectively avoids variance collapse, judging by a previously described test developed for this purpose, and performs well on standard data sets. In addition, SMI requires considerably fewer particles than SVGD to accurately estimate uncertainty for small BNNs. The synergistic combination of NSVGD, ELBO optimization and user-specified guides establishes a promising approach towards variational Bayesian inference in the case of tall and wide data.
Modeling and Prediction of the UEFA EURO 2024 via Combined Statistical Learning Approaches
Groll, Andreas, Hvattum, Lars M., Ley, Christophe, Sternemann, Jonas, Schauberger, Gunther, Zeileis, Achim
In this work, three fundamentally different machine learning models are combined to create a new, joint model for forecasting the UEFA EURO 2024. Therefore, a generalized linear model, a random forest model, and a extreme gradient boosting model are used to predict the number of goals a team scores in a match. The three models are trained on the match results of the UEFA EUROs 2004-2020, with additional covariates characterizing the teams for each tournament as well as three enhanced variables derived from different ranking methods for football teams. The first enhanced variable is based on historic match data from national teams, the second is based on the bookmakers' tournament winning odds of all participating teams, and the third is based on historic match data of individual players both for club and international matches, resulting in player ratings. Then, based on current covariate information of the participating teams, the final trained model is used to predict the UEFA EURO 2024. For this purpose, the tournament is simulated 100.000 times, based on the estimated expected number of goals for all possible matches, from which probabilities across the different tournament stages are derived. Our combined model identifies France as the clear favourite with a winning probability of 19.2%, followed by England (16.7%) and host Germany (13.7%).
Prediction of Handball Matches with Statistically Enhanced Learning via Estimated Team Strengths
Felice, Florian, Ley, Christophe
We propose a Statistically Enhanced Learning (aka. SEL) model to predict handball games. Our Machine Learning model augmented with SEL features outperforms state-of-the-art models with an accuracy beyond 80%. In this work, we show how we construct the data set to train Machine Learning models on past female club matches. We then compare different models and evaluate them to assess their performance capabilities. Finally, explainability methods allow us to change the scope of our tool from a purely predictive solution to a highly insightful analytical tool. This can become a valuable asset for handball teams' coaches providing valuable statistical and predictive insights to prepare future competitions.
Hybrid Machine Learning Forecasts for the FIFA Women's World Cup 2019
Groll, Andreas, Ley, Christophe, Schauberger, Gunther, Van Eetvelde, Hans, Zeileis, Achim
In this work, we combine two different ranking methods together with several other predictors in a joint random forest approach for the scores of soccer matches. The first ranking method is based on the bookmaker consensus, the second ranking method estimates adequate ability parameters that reflect the current strength of the teams best. The proposed combined approach is then applied to the data from the two previous FIFA Women's World Cups 2011 and 2015. Finally, based on the resulting estimates, the FIFA Women's World Cup 2019 is simulated repeatedly and winning probabilities are obtained for all teams. The model clearly favors the defending champion USA before the host France.