AITopics | paired

EmergentComplexityandZero-shotTransfervia UnsupervisedEnvironmentDesign

Neural Information Processing SystemsFeb-9-2026, 11:27:15 GMT

Awide range ofreinforcement learning (RL) problems --including robustness, transfer learning, unsupervised RL, and emergent complexity -- require specifying a distribution of tasks or environments in which a policy will be trained.

artificial intelligence, arxivpreprintarxiv, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

0e915db6326b6fb6a3c56546980a8c93-Supplemental.pdf

Neural Information Processing SystemsFeb-7-2026, 12:16:50 GMT

Let B be the maximum difference betweenU1t and U2t, and let (π,θ1,θ2) be a Nash Equilibrium forG. Let π1 be the best response to the first teacher (with utilityU1t) and let π1+2 be the best response policy to the joint teacher. This result shows that as we reduce the number of random episodes, the approximation to aminimax regret strategy improves. Let G be the dual curriculum game in which the first teacher maximizes regret, so U1t = URt, and the second teacher plays randomly, soU2t = UUt . Finally,we need to show thatπ2+3 isoptimal for the student.

architecture, artificial intelligence, budget, (18 more...)

Neural Information Processing Systems

Country:

Europe > Italy (0.05)
Asia > Singapore (0.05)
South America > Brazil (0.05)
(17 more...)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Sports > Motorsports > Formula One (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.47)

Add feedback

Replay-Guided Adversarial Environment Design

Neural Information Processing SystemsDec-23-2025, 18:34:30 GMT

Deep reinforcement learning (RL) agents may successfully generalize to new settings if trained on an appropriately diverse set of environment and task configurations. Unsupervised Environment Design (UED) is a promising self-supervised RL paradigm, wherein the free parameters of an underspecified environment are automatically adapted during training to the agent's capabilities, leading to the emergence of diverse training environments. Here, we cast Prioritized Level Replay (PLR), an empirically successful but theoretically unmotivated method that selectively samples randomly-generated training levels, as UED. We argue that by curating completely random levels, PLR, too, can generate novel and complex levels for effective training. This insight reveals a natural class of UED methods we call Dual Curriculum Design (DCD). Crucially, DCD includes both PLR and a popular UED algorithm, PAIRED, as special cases and inherits similar theoretical guarantees. This connection allows us to develop novel theory for PLR, providing a version with a robustness guarantee at Nash equilibria. Furthermore, our theory suggests a highly counterintuitive improvement to PLR: by stopping the agent from updating its policy on uncurated levels (training on less data), we can improve the convergence to Nash equilibria. Indeed, our experiments confirm that our new method, PLR$^{\perp}$, obtains better results on a suite of out-of-distribution, zero-shot transfer tasks, in addition to demonstrating that PLR$^{\perp}$ improves the performance of PAIRED, from which it inherited its theoretical framework.

name change, plr, replay-guided adversarial environment design, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)

Add feedback

985e9a46e10005356bbaf194249f6856-Supplemental.pdf

Neural Information Processing SystemsAug-22-2025, 00:29:08 GMT

adversary, agent, antagonist, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

985e9a46e10005356bbaf194249f6856-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 06:50:06 GMT

adversary, agent, arxiv preprint arxiv, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > New Jersey (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Generalizable Image Repair for Robust Visual Autonomous Racing

Sobolewski, Carson, Mao, Zhenjiang, Vejre, Kshitij, Ruchkin, Ivan

arXiv.org Artificial IntelligenceMar-7-2025

Vision-based autonomous racing relies on accurate perception for robust control. However, image distribution changes caused by sensor noise, adverse weather, and dynamic lighting can degrade perception, leading to suboptimal control decisions. Existing approaches, including domain adaptation and adversarial training, improve robustness but struggle to generalize to unseen corruptions while introducing computational overhead. To address this challenge, we propose a real-time image repair module that restores corrupted images before they are used by the controller. Our method leverages generative adversarial models, specifically CycleGAN and pix2pix, for image repair. CycleGAN enables unpaired image-to-image translation to adapt to novel corruptions, while pix2pix exploits paired image data when available to improve the quality. To ensure alignment with control performance, we introduce a control-focused loss function that prioritizes perceptual consistency in repaired images. We evaluated our method in a simulated autonomous racing environment with various visual corruptions. The results show that our approach significantly improves performance compared to baselines, mitigating distribution shift and enhancing controller reliability.

controller, corruption, cyclegan, (15 more...)

arXiv.org Artificial Intelligence

2503.05911

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Replay-Guided Adversarial Environment Design

Neural Information Processing SystemsOct-9-2024, 12:48:50 GMT

Deep reinforcement learning (RL) agents may successfully generalize to new settings if trained on an appropriately diverse set of environment and task configurations. Unsupervised Environment Design (UED) is a promising self-supervised RL paradigm, wherein the free parameters of an underspecified environment are automatically adapted during training to the agent's capabilities, leading to the emergence of diverse training environments. Here, we cast Prioritized Level Replay (PLR), an empirically successful but theoretically unmotivated method that selectively samples randomly-generated training levels, as UED. We argue that by curating completely random levels, PLR, too, can generate novel and complex levels for effective training. This insight reveals a natural class of UED methods we call Dual Curriculum Design (DCD).

artificial intelligence, machine learning, reinforcement learning, (6 more...)

Neural Information Processing Systems

Industry: Education (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.99)

Add feedback

Stabilizing Unsupervised Environment Design with a Learned Adversary

Mediratta, Ishita, Jiang, Minqi, Parker-Holder, Jack, Dennis, Michael, Vinitsky, Eugene, Rocktäschel, Tim

arXiv.org Artificial IntelligenceAug-22-2023

A key challenge in training generally-capable agents is the design of training tasks that facilitate broad generalization and robustness to environment variations. This challenge motivates the problem setting of Unsupervised Environment Design (UED), whereby a student agent trains on an adaptive distribution of tasks proposed by a teacher agent. A pioneering approach for UED is PAIRED, which uses reinforcement learning (RL) to train a teacher policy to design tasks from scratch, making it possible to directly generate tasks that are adapted to the agent's current capabilities. Despite its strong theoretical backing, PAIRED suffers from a variety of challenges that hinder its practical performance. Thus, state-of-the-art methods currently rely on curation and mutation rather than generation of new tasks. In this work, we investigate several key shortcomings of PAIRED and propose solutions for each shortcoming. As a result, we make it possible for PAIRED to match or exceed state-of-the-art methods, producing robust agents in several established challenging procedurally-generated environments, including a partially-observed maze navigation task and a continuous-control car racing environment. We believe this work motivates a renewed emphasis on UED methods based on learned models that directly generate challenging environments, potentially unlocking more open-ended RL training and, as a result, more general agents.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2308.10797

Country:

Europe > Italy (0.04)
Europe > Germany (0.04)
Asia > Singapore (0.04)
(21 more...)

Genre: Research Report > Promising Solution (1.00)

Industry:

Education (1.00)
Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

CLUTR: Curriculum Learning via Unsupervised Task Representation Learning

Azad, Abdus Salam, Gur, Izzeddin, Emhoff, Jasper, Alexis, Nathaniel, Faust, Aleksandra, Abbeel, Pieter, Stoica, Ion

arXiv.org Artificial IntelligenceMar-7-2023

Deep Reinforcement Learning (RL) has shown exciting progress in the past decade in many challenging domains including Reinforcement Learning (RL) algorithms are often Atari (Mnih et al., 2015), Dota (Berner et al., 2019), known for sample inefficiency and difficult Go (Silver et al., 2016). However, deep RL is also known generalization. Recently, Unsupervised Environment for its sample inefficiency and difficult generalization-- Design (UED) emerged as a new paradigm performing poorly on unseen tasks or failing altogether for zero-shot generalization by simultaneously with the slightest change (Cobbe et al., 2019; Azad et al., learning a task distribution and agent policies 2022; Zhang et al., 2018). While, Curriculum Learning on the generated tasks. This is a non-stationary (CL) algorithms have shown to improve RL sample efficiency process where the task distribution evolves along by adapting the training task distribution, i.e., the with agent policies; creating an instability over curriculum (Portelas et al., 2020; Narvekar et al., 2020), time. While past works demonstrated the potential recently a class of Unsupervised CL algorithms, called Unsupervised of such approaches, sampling effectively from Environment Design (UED) (Dennis et al., 2020; the task space remains an open challenge, bottlenecking Jiang et al., 2021a) has shown promising zero-shot generalization these approaches. To this end, we introduce by automatically generating the training tasks and CLUTR: a novel unsupervised curriculum adapting the curriculum simultaneously.

machine learning, natural language, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2210.10243

Country:

Europe (1.00)
Asia (0.93)
North America > United States (0.28)

Genre: Research Report (0.64)

Industry:

Education (1.00)
Information Technology (0.93)
Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Paper review: PAIRED

#artificialintelligenceSep-15-2022, 05:04:29 GMT

When browsing through new data science papers, from time to time you encounter clever new ideas. And even though they sometimes aren't broadly adopted yet, they can yield great potential. I believe the paper I'll talk about today is one of those papers: "Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design" by Michael Dennis, Natasha Jaques et al. from the Google Brain team. The title is a mouthful, but in short, the paper talks about an automated way to generate increasingly challenging environments for RL models. You can find the paper here.

agent, antagonist, environment generation model, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Filters

Collaborating Authors

paired

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

EmergentComplexityandZero-shotTransfervia UnsupervisedEnvironmentDesign

0e915db6326b6fb6a3c56546980a8c93-Supplemental.pdf

Replay-Guided Adversarial Environment Design

985e9a46e10005356bbaf194249f6856-Supplemental.pdf

985e9a46e10005356bbaf194249f6856-Paper.pdf

Generalizable Image Repair for Robust Visual Autonomous Racing

Replay-Guided Adversarial Environment Design

Stabilizing Unsupervised Environment Design with a Learned Adversary

CLUTR: Curriculum Learning via Unsupervised Task Representation Learning

Paper review: PAIRED