ued
Refining Minimax Regret for Unsupervised Environment Design
Beukman, Michael, Coward, Samuel, Matthews, Michael, Fellows, Mattie, Jiang, Minqi, Dennis, Michael, Foerster, Jakob
In unsupervised environment design, reinforcement learning agents are trained on environment configurations (levels) generated by an adversary that maximises some objective. Regret is a commonly used objective that theoretically results in a minimax regret (MMR) policy with desirable robustness guarantees; in particular, the agent's maximum regret is bounded. However, once the agent reaches this regret bound on all levels, the adversary will only sample levels where regret cannot be further reduced. Although there are possible performance improvements to be made outside of these regret-maximising levels, learning stagnates. In this work, we introduce Bayesian level-perfect MMR (BLP), a refinement of the minimax regret objective that overcomes this limitation. We formally show that solving for this objective results in a subset of MMR policies, and that BLP policies act consistently with a Perfect Bayesian policy over all levels. We further introduce an algorithm, ReMiDi, that results in a BLP policy at convergence. We empirically demonstrate that training on levels from a minimax regret adversary causes learning to prematurely stagnate, but that ReMiDi continues learning.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (6 more...)
- Research Report (0.50)
- Instructional Material (0.46)
Generalization through Diversity: Improving Unsupervised Environment Design
Li, Wenjun, Varakantham, Pradeep, Li, Dexun
Agent decision making using Reinforcement Learning (RL) heavily relies on either a model or simulator of the environment (e.g., moving in an 8x8 maze with three rooms, playing Chess on an 8x8 board). Due to this dependence, small changes in the environment (e.g., positions of obstacles in the maze, size of the board) can severely affect the effectiveness of the policy learned by the agent. To that end, existing work has proposed training RL agents on an adaptive curriculum of environments (generated automatically) to improve performance on out-of-distribution (OOD) test scenarios. Specifically, existing research has employed the potential for the agent to learn in an environment (captured using Generalized Advantage Estimation, GAE) as the key factor to select the next environment(s) to train the agent. However, such a mechanism can select similar environments (with a high potential to learn) thereby making agent training redundant on all but one of those environments. To that end, we provide a principled approach to adaptively identify diverse environments based on a novel distance measure relevant to environment design. We empirically demonstrate the versatility and effectiveness of our method in comparison to multiple leading approaches for unsupervised environment design on three distinct benchmark problems used in literature.
- Education (0.96)
- Leisure & Entertainment > Games (0.66)
- Leisure & Entertainment > Sports > Motorsports (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
CLUTR: Curriculum Learning via Unsupervised Task Representation Learning
Azad, Abdus Salam, Gur, Izzeddin, Emhoff, Jasper, Alexis, Nathaniel, Faust, Aleksandra, Abbeel, Pieter, Stoica, Ion
Deep Reinforcement Learning (RL) has shown exciting progress in the past decade in many challenging domains including Reinforcement Learning (RL) algorithms are often Atari (Mnih et al., 2015), Dota (Berner et al., 2019), known for sample inefficiency and difficult Go (Silver et al., 2016). However, deep RL is also known generalization. Recently, Unsupervised Environment for its sample inefficiency and difficult generalization-- Design (UED) emerged as a new paradigm performing poorly on unseen tasks or failing altogether for zero-shot generalization by simultaneously with the slightest change (Cobbe et al., 2019; Azad et al., learning a task distribution and agent policies 2022; Zhang et al., 2018). While, Curriculum Learning on the generated tasks. This is a non-stationary (CL) algorithms have shown to improve RL sample efficiency process where the task distribution evolves along by adapting the training task distribution, i.e., the with agent policies; creating an instability over curriculum (Portelas et al., 2020; Narvekar et al., 2020), time. While past works demonstrated the potential recently a class of Unsupervised CL algorithms, called Unsupervised of such approaches, sampling effectively from Environment Design (UED) (Dennis et al., 2020; the task space remains an open challenge, bottlenecking Jiang et al., 2021a) has shown promising zero-shot generalization these approaches. To this end, we introduce by automatically generating the training tasks and CLUTR: a novel unsupervised curriculum adapting the curriculum simultaneously.
- Education (1.00)
- Information Technology (0.93)
- Leisure & Entertainment > Sports (0.46)
Uncertain Time Series Classification With Shapelet Transform
Mbouopda, Michael Franklin, Nguifo, Engelbert Mephu
Time series classification is a task that aims at classifying chronological data. It is used in a diverse range of domains such as meteorology, medicine and physics. In the last decade, many algorithms have been built to perform this task with very appreciable accuracy. However, applications where time series have uncertainty has been under-explored. Using uncertainty propagation techniques, we propose a new uncertain dissimilarity measure based on Euclidean distance. We then propose the uncertain shapelet transform algorithm for the classification of uncertain time series. The large experiments we conducted on state of the art datasets show the effectiveness of our contribution. The source code of our contribution and the datasets we used are all available on a public repository.
- North America > United States > New York > New York County > New York City (0.14)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > France > Auvergne-Rhône-Alpes > Puy-de-Dôme > Clermont-Ferrand (0.04)
- Asia (0.04)
- Research Report (0.64)
- Workflow (0.46)