plr
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > Puerto Rico > San Juan > San Juan (0.04)
- North America > Canada > British Columbia > Vancouver (0.04)
- (3 more...)
- Education (0.94)
- Leisure & Entertainment > Games (0.93)
- Leisure & Entertainment > Sports > Motorsports > Formula One (0.46)
Replay-Guided Adversarial Environment Design
Deep reinforcement learning (RL) agents may successfully generalize to new settings if trained on an appropriately diverse set of environment and task configurations. Unsupervised Environment Design (UED) is a promising self-supervised RL paradigm, wherein the free parameters of an underspecified environment are automatically adapted during training to the agent's capabilities, leading to the emergence of diverse training environments. Here, we cast Prioritized Level Replay (PLR), an empirically successful but theoretically unmotivated method that selectively samples randomly-generated training levels, as UED. We argue that by curating completely random levels, PLR, too, can generate novel and complex levels for effective training. This insight reveals a natural class of UED methods we call Dual Curriculum Design (DCD). Crucially, DCD includes both PLR and a popular UED algorithm, PAIRED, as special cases and inherits similar theoretical guarantees. This connection allows us to develop novel theory for PLR, providing a version with a robustness guarantee at Nash equilibria. Furthermore, our theory suggests a highly counterintuitive improvement to PLR: by stopping the agent from updating its policy on uncurated levels (training on less data), we can improve the convergence to Nash equilibria. Indeed, our experiments confirm that our new method, PLR$^{\perp}$, obtains better results on a suite of out-of-distribution, zero-shot transfer tasks, in addition to demonstrating that PLR$^{\perp}$ improves the performance of PAIRED, from which it inherited its theoretical framework.
TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design
Cho, Geonwoo, Im, Jaegyun, Lee, Jihwan, Yi, Hojun, Kim, Sejin, Kim, Sundong
Generalizing deep reinforcement learning agents to unseen environments remains a significant challenge. One promising solution is Unsupervised Environment Design (UED), a co-evolutionary framework in which a teacher adaptively generates tasks with high learning potential, while a student learns a robust policy from this evolving curriculum. Existing UED methods typically measure learning potential via regret, the gap between optimal and current performance, approximated solely by value-function loss. Building on these approaches, we introduce the transition-prediction error as an additional term in our regret approximation. To capture how training on one task affects performance on others, we further propose a lightweight metric called Co-Learnability. By combining these two measures, we present Transition-aware Regret Approximation with Co-learnability for Environment Design (TRACED). Empirical evaluations show that TRACED produces curricula that improve zero-shot generalization over strong baselines across multiple benchmarks. Ablation studies confirm that the transition-prediction error drives rapid complexity ramp-up and that Co-Learnability delivers additional gains when paired with the transition-prediction error. These results demonstrate how refined regret approximation and explicit modeling of task relationships can be leveraged for sample-efficient curriculum design in UED. Project Page: https://geonwoo.me/traced/
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
22fb0cee7e1f3bde58293de743871417-Reviews.html
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The authors consider associative learning in networks of spiking neurons, and argue that a form of STDP with postsynaptic hyper-polarization is equivalent to the perceptron learning algorithm. The basic form of STDP proposed by the authors relies on traces (similarly to Morrison, Diesmann & Gerstner, "Phenomenological models of synaptic plasticity based on spike timing", Biol Cybern, 2008, 98, 459-478, which should have been mentioned here), and allows for both potentiation and depression of the synapse. The authors then introduce the perceptron learning rule (PLR) for binary variables, in a form where the weighted sum of inputs is compared to a threshold in order to determine the update. As is well known, the PLR is a supervised learning algorithm requiring a target to be specified at the post-synaptic site.
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > Puerto Rico > San Juan > San Juan (0.04)
- North America > Canada > British Columbia > Vancouver (0.04)
- (3 more...)
- Education (0.94)
- Leisure & Entertainment > Games (0.93)
- Leisure & Entertainment > Sports > Motorsports > Formula One (0.46)
Task Scheduling & Forgetting in Multi-Task Reinforcement Learning
Speckmann, Marc, Eimer, Theresa
Reinforcement learning (RL) agents can forget tasks they have previously been trained on. There is a rich body of work on such forgetting effects in humans. Therefore we look for commonalities in the forgetting behavior of humans and RL agents across tasks and test the viability of forgetting prevention measures from learning theory in RL. W e find that in many cases, RL agents exhibit forgetting curves similar to those of humans. Methods like Leitner or SuperMemo have been shown to be effective at counteracting human forgetting, but we demonstrate they do not transfer as well to RL. W e identify a likely cause: asymmetrical learning and retention patterns between tasks that cannot be captured by retention-based or performance-based curriculum strategies.
Where Do Large Learning Rates Lead Us?
Sadrtdinov, Ildus, Kodryan, Maxim, Pokonechny, Eduard, Lobacheva, Ekaterina, Vetrov, Dmitry
It is generally accepted that starting neural networks training with large learning rates (LRs) improves generalization. Following a line of research devoted to understanding this effect, we conduct an empirical study in a controlled setting focusing on two questions: 1) how large an initial LR is required for obtaining optimal quality, and 2) what are the key differences between models trained with different LRs? We discover that only a narrow range of initial LRs slightly above the convergence threshold lead to optimal results after fine-tuning with a small LR or weight averaging. By studying the local geometry of reached minima, we observe that using LRs from this optimal range allows for the optimization to locate a basin that only contains high-quality minima. Additionally, we show that these initial LRs result in a sparse set of learned features, with a clear focus on those most relevant for the task. In contrast, starting training with too small LRs leads to unstable minima and attempts to learn all features simultaneously, resulting in poor generalization. Conversely, using initial LRs that are too large fails to detect a basin with good solutions and extract meaningful patterns from the data.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
Replay-Guided Adversarial Environment Design
Deep reinforcement learning (RL) agents may successfully generalize to new settings if trained on an appropriately diverse set of environment and task configurations. Unsupervised Environment Design (UED) is a promising self-supervised RL paradigm, wherein the free parameters of an underspecified environment are automatically adapted during training to the agent's capabilities, leading to the emergence of diverse training environments. Here, we cast Prioritized Level Replay (PLR), an empirically successful but theoretically unmotivated method that selectively samples randomly-generated training levels, as UED. We argue that by curating completely random levels, PLR, too, can generate novel and complex levels for effective training. This insight reveals a natural class of UED methods we call Dual Curriculum Design (DCD).