Over the past two decades, Machine Learning (ML) techniques have been increasingly utilized for the purpose of predicting outcomes in sport. In this paper, we provide a review of studies that have used ML for predicting results in team sport, covering studies from 1996 to 2019. We sought to answer five key research questions while extensively surveying papers in this field. This paper offers insights into which ML algorithms have tended to be used in this field, as well as those that are beginning to emerge with successful outcomes. Our research highlights defining characteristics of successful studies and identifies robust strategies for evaluating accuracy results in this application domain. Our study considers accuracies that have been achieved across different sports and explores the notion that outcomes of some team sports could be inherently more difficult to predict than others. Finally, our study uncovers common themes of future research directions across all surveyed papers, looking for gaps and opportunities, while proposing recommendations for future researchers in this domain.
Recently, data mining studies are being successfully conducted to estimate several parameters in a variety of domains. Data mining techniques have attracted the attention of the information industry and society as a whole, due to a large amount of data and the imminent need to turn it into useful knowledge. However, the effective use of data in some areas is still under development, as is the case in sports, which in recent years, has presented a slight growth; consequently, many sports organizations have begun to see that there is a wealth of unexplored knowledge in the data extracted by them. Therefore, this article presents a systematic review of sports data mining. Regarding years 2010 to 2018, 31 types of research were found in this topic. Based on these studies, we present the current panorama, themes, the database used, proposals, algorithms, and research opportunities. Our findings provide a better understanding of the sports data mining potentials, besides motivating the scientific community to explore this timely and interesting topic.
In this study we focus on the prediction of basketball games in the Euroleague competition using machine learning modelling. The prediction is a binary classification problem, predicting whether a match finishes 1 (home win) or 2 (away win). Data is collected from the Euroleague's official website for the seasons 2016-2017, 2017-2018 and 2018-2019, i.e. in the new format era. Features are extracted from matches' data and off-the-shelf supervised machine learning techniques are applied. We calibrate and validate our models. We find that simple machine learning models give accuracy not greater than 67% on the test set, worse than some sophisticated benchmark models. Additionally, the importance of this study lies in the "wisdom of the basketball crowd" and we demonstrate how the predicting power of a collective group of basketball enthusiasts can outperform machine learning models discussed in this study. We argue why the accuracy level of this group of "experts" should be set as the benchmark for future studies in the prediction of (European) basketball games using machine learning.
In this paper, we present a new application-focused benchmark dataset and results from a set of baseline Natural Language Processing and Machine Learning models for prediction of match outcomes for games of football (soccer). By doing so we give a baseline for the prediction accuracy that can be achieved exploiting both statistical match data and contextual articles from human sports journalists. Our dataset is focuses on a representative time-period over 6 seasons of the English Premier League, and includes newspaper match previews from The Guardian. The models presented in this paper achieve an accuracy of 63.18% showing a 6.9% boost on the traditional statistical methods.
Understanding the principles of real-world biological multi-agent behaviors is a current challenge in various scientific and engineering fields. The rules regarding the real-world biological multi-agent behaviors such as team sports are often largely unknown due to their inherently higher-order interactions, cognition, and body dynamics. Estimation of the rules from data, i.e., data-driven approaches such as machine learning, provides an effective way for the analysis of such behaviors. Although most data-driven models have non-linear structures and high prediction performances, it is sometimes hard to interpret them. This survey focuses on data-driven analysis for quantitative understanding of invasion team sports behaviors such as basketball and football, and introduces two main approaches for understanding such multi-agent behaviors: (1) extracting easily interpretable features or rules from data and (2) generating and controlling behaviors in visually-understandable ways. The first approach involves the visualization of learned representations and the extraction of mathematical structures behind the behaviors. The second approach can be used to test hypotheses by simulating and controlling future and counterfactual behaviors. Lastly, the potential practical applications of extracted rules, features, and generated behaviors are discussed. These approaches can contribute to a better understanding of multi-agent behaviors in the real world.