We extend the Bayesian skill rating system TrueSkill to infer entire time series of skills of players by smoothing through time instead of filtering. The skill of each participating player, say, every year is represented by a latent skill variable which is affected by the relevant game outcomes that year, and coupled with the skill variables of the previous and subsequent year. Inference in the resulting factor graph is carried out by approximate message passing (EP) along the time series of skills. As before the system tracks the uncertainty about player skills, explicitly models draws, can deal with any number of competing entities and can infer individual skills from team results. We extend the system to estimate player-specific draw margins. Basedon these models we present an analysis of the skill curves of important players in the history of chess over the past 150 years. Results include plots of players' lifetime skill development as well as the ability to compare the skills of different players across time. Our results indicate that a) the overall playing strength has increased over the past 150 years, and b) that modelling a player's ability to force a draw provides significantly better predictive power.
Fitzgerald, Tadhg (Insight Centre for Data Analytics and University College Cork) | Malitsky, Yuri (IBM TJ Watson Research Center) | O'Sullivan, Barry (Insight Centre for Data Analytics and University College Cork)
It is now readily accepted that automated algorithm configuration is a necessity for ensuring optimized performance of solvers on a particular problem domain. Even the best developers who have carefully designed their solver are not always able to manually find the best parameter settings for it. Yet, the opportunity for improving performance has been repeatedly demonstrated by configuration tools like ParamILS, SMAC, and GGA. However, all these techniques currently assume a static environment, where demonstrative instances are procured beforehand, potentially unlimited time is provided to adequately search the parameter space, and the solver would never need to be retrained. This is not always the case in practice. The ReACT system, proposed in 2014, demonstrated that a solver could be configured during runtime as new instances arrive in a steady stream. This paper further develops that approach and shows how a ranking scheme, like TrueSkill, can further improve the configurator's performance, making it able to quickly find good parameterizations without adding any overhead on the time needed to solve any new instance, and then continuously improve as new instances are evaluated. The enhancements to ReACT that we present enable us to even outperform existing static configurators like SMAC in a non-dynamic setting.
Throughout the history of games, representing the abilities of the various agents acting on behalf of the players has been a central concern. With increasingly sophisticated games emerging, these simulations have become more realistic, but the underlying mechanisms are still, to a large extent, of an ad hoc nature. This paper proposes using a logistic model from psychometrics as a unified mechanism for task resolution in simulation-oriented games.
The recent advances in computer-assisted learning systems and the availability of open educational resources today promise a pathway to providing cost-efficient, high-quality education to large masses of learners. One of the most ambitious use cases of computer-assisted learning is to build a lifelong learning recommendation system. Unlike short-term courses, lifelong learning presents unique challenges, requiring sophisticated recommendation models that account for a wide range of factors such as background knowledge of learners or novelty of the material while effectively maintaining knowledge states of masses of learners for significantly longer periods of time (ideally, a lifetime). This work presents the foundations towards building a dynamic, scalable and transparent recommendation system for education, modelling learner's knowledge from implicit data in the form of engagement with open educational resources. We i) use a text ontology based on Wikipedia to automatically extract knowledge components of educational resources and, ii) propose a set of online Bayesian strategies inspired by the well-known areas of item response theory and knowledge tracing. Our proposal, TrueLearn, focuses on recommendations for which the learner has enough background knowledge (so they are able to understand and learn from the material), and the material has enough novelty that would help the learner improve their knowledge about the subject and keep them engaged. We further construct a large open educational video lectures dataset and test the performance of the proposed algorithms, which show clear promise towards building an effective educational recommendation system.
OpenAI, null, :, null, Berner, Christopher, Brockman, Greg, Chan, Brooke, Cheung, Vicki, Dębiak, Przemysław, Dennison, Christy, Farhi, David, Fischer, Quirin, Hashme, Shariq, Hesse, Chris, Józefowicz, Rafal, Gray, Scott, Olsson, Catherine, Pachocki, Jakub, Petrov, Michael, Pinto, Henrique Pondé de Oliveira, Raiman, Jonathan, Salimans, Tim, Schlatter, Jeremy, Schneider, Jonas, Sidor, Szymon, Sutskever, Ilya, Tang, Jie, Wolski, Filip, Zhang, Susan
The long-term goal of artificial intelligence is to solve advanced real-world challenges. Games have served as stepping stones along this path for decades, from Backgammon (1992) to Chess (1997) to Atari (2013)[1-3]. In 2016, AlphaGo defeated the world champion at Go using deep reinforcement learning and Monte Carlo tree search. In recent years, reinforcement learning (RL) models have tackled tasks as varied as robotic manipulation, text summarization , and video games such as Starcraft and Minecraft. Relative to previous AI milestones like Chess or Go, complex video games start to capture the complexity and continuous nature of the real world. Dota 2 is a multiplayer real-time strategy game produced by Valve Corporation in 2013, which averaged between 500,000 and 1,000,000 concurrent players between 2013 and 2019. The game is actively played by full time professionals; the prize pool for the 2019 international championship exceeded $35 million (the largest of any esports game in the world)[9, 10]. The game presents challenges for reinforcement learning due to long time horizons, partial observability, and high dimensionality of observation and action spaces.