AITopics | minigrid

Collaborating Authors

minigrid

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Function

Neural Information Processing SystemsApr-24-2026, 10:52:37 GMT

Algorithm 2 details the pseudocode for the partition function used in LaMCTS, which we use in LaP3 as well. Algorithm 2 Partition Function 1: Input: Input Space Ω, Samples St, Node partition threshold Nthres, Partitioning Latent Model s(x) 2: Set V0 = {Ω} 3: Set Vqueue = {Ω} 4: while Vqueue 6= do 5: Ωp Vqueue.pop(0) It is clear that Fk(y) is a monotonically decreasing function with Fk(0) = 1 and limy + Fk(y) = 0. Here we assume it is strictly decreasing so that Fk(y) has a well-defined inverse function F 1k . In the following, we will omit the subscript k for brevity. P[f(xi) g y|xi Ωk] (4) = 1 Fntk (y) (5) Note that 1 is due to the fact that all samples x1,...,xnt are independently drawn within the region Ωk.

artificial intelligence, lap3, machine learning, (15 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SEnRNBe1.BeasIRnethfore2.Bepushedieuni3.Becopfr4.Bero1.th2.onco3.cthaneri4.-euppebLoHihIRLoHihIRReHiIRReHiIR NovelD

Neural Information Processing SystemsFeb-11-2026, 08:01:57 GMT

Modernworksadoptvarious Intrinsic Reward (IR) designstoguideexplorationin hard-explorationsettings. W evaluate AMIGo for 500Msteps12].

artificial intelligence, arxivpreprintarxiv, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > California > San Diego County > San Diego (0.04)

Industry: Leisure & Entertainment > Games (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

1d0ed12c3fda52f2c241a0cebcf739a6-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 00:08:57 GMT

agent, jaxnav, learnability, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(4 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design

Cho, Geonwoo, Im, Jaegyun, Lee, Jihwan, Yi, Hojun, Kim, Sejin, Kim, Sundong

arXiv.org Artificial IntelligenceDec-4-2025

Generalizing deep reinforcement learning agents to unseen environments remains a significant challenge. One promising solution is Unsupervised Environment Design (UED), a co-evolutionary framework in which a teacher adaptively generates tasks with high learning potential, while a student learns a robust policy from this evolving curriculum. Existing UED methods typically measure learning potential via regret, the gap between optimal and current performance, approximated solely by value-function loss. Building on these approaches, we introduce the transition-prediction error as an additional term in our regret approximation. To capture how training on one task affects performance on others, we further propose a lightweight metric called Co-Learnability. By combining these two measures, we present Transition-aware Regret Approximation with Co-learnability for Environment Design (TRACED). Empirical evaluations show that TRACED produces curricula that improve zero-shot generalization over strong baselines across multiple benchmarks. Ablation studies confirm that the transition-prediction error drives rapid complexity ramp-up and that Co-Learnability delivers additional gains when paired with the transition-prediction error. These results demonstrate how refined regret approximation and explicit modeling of task relationships can be leveraged for sample-efficient curriculum design in UED. Project Page: https://geonwoo.me/traced/

machine learning, natural language, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2506.19997

Genre: Research Report > New Finding (1.00)

Industry: Education > Curriculum (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery Alex Rutherford Michael Beukman Timon Willi Bruno Lacerda Nick Hawes Jakob Foerster University of Oxford

Neural Information Processing SystemsOct-9-2025, 20:14:14 GMT

Put differently, current methods fail to predict intuitive measures of "learnability."

agent, jaxnav, learnability, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.40)
Asia > Middle East > Jordan (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(4 more...)

Genre: Research Report > Experimental Study (0.93)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Towards Monotonic Improvement in In-Context Reinforcement Learning

Zhang, Wenhao, Zhang, Shao, Wang, Xihuai, Li, Yang, Wen, Ying

arXiv.org Artificial IntelligenceSep-30-2025

In-Context Reinforcement Learning (ICRL) has emerged as a promising paradigm for developing agents that can rapidly adapt to new tasks by leveraging past experiences as context, without updating their parameters. Recent approaches train large sequence models on monotonic policy improvement data from online RL, aiming to a continue improved testing time performance. However, our experimental analysis reveals a critical flaw: these models cannot show a continue improvement like the training data during testing time. Theoretically, we identify this phenomenon as Contextual Ambiguity, where the model's own stochastic actions can generate an interaction history that misleadingly resembles that of a sub-optimal policy from the training data, initiating a vicious cycle of poor action selection. To resolve the Contextual Ambiguity, we introduce Context Value into training phase and propose Context Value Informed ICRL (CV-ICRL). CV-ICRL use Context Value as an explicit signal representing the ideal performance theoretically achievable by a policy given the current context. As the context expands, Context Value could include more task-relevant information, and therefore the ideal performance should be non-decreasing. We prove that the Context Value tightens the lower bound on the performance gap relative to an ideal, monotonically improving policy. We fruther propose two methods for estimating Context Value at both training and testing time. Experiments conducted on the Dark Room and Minigrid testbeds demonstrate that CV-ICRL effectively mitigates performance degradation and improves overall ICRL abilities across various tasks and environments. The source code and data of this paper are available at https://github.com/Bluixe/towards_monotonic_improvement .

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2509.23209

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

d428d070622e0f4363fceae11f4a3576-Paper.pdf

Neural Information Processing SystemsAug-22-2025, 01:13:58 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.68)

Industry: Leisure & Entertainment > Games > Computer Games (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)

Add feedback

1.5M Steps 3.1M Steps RND BeBold 6.4M Steps 4.6M Steps 7.5M Steps 9.8M Steps 1.0M Steps 1.4M Steps 3.4M Steps 2.4M Steps 3.9M Steps 4.8M Steps

Neural Information Processing SystemsAug-17-2025, 14:08:59 GMT

We provide final testing performance for NovelD and all baselines in MiniGrid. We also provide more intrinsic analysis similar to Sec. 4.2 in a seven-room environment in Figure 1. There are other categories of static environment. The initial position of the agent and goal can be random. The position of the agent and goal is randomized.

artificial intelligence, coefficient, step 3, (15 more...)

Neural Information Processing Systems

Genre: Workflow (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.33)

Add feedback

A Proof of Theorem

Neural Information Processing SystemsAug-17-2025, 02:17:42 GMT

For the first argument, we use induction. For the second part, we it is essentially a Coupon Collector's problem. The colors represent the target environment. The environment is shown in Figure 6. The results are shown in Figure 5. Forward} to reach the target grid (green).

artificial intelligence, molecule, probability, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.31)

Add feedback

Filters

Collaborating Authors

minigrid

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Function

SEnRNBe1.BeasIRnethfore2.Bepushedieuni3.Becopfr4.Bero1.th2.onco3.cthaneri4.-euppebLoHihIRLoHihIRReHiIRReHiIR NovelD

1d0ed12c3fda52f2c241a0cebcf739a6-Paper-Conference.pdf

03a3655fff3e9bdea48de9f49e938e32-Supplemental.pdf

TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design

No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery Alex Rutherford Michael Beukman Timon Willi Bruno Lacerda Nick Hawes Jakob Foerster University of Oxford

Towards Monotonic Improvement in In-Context Reinforcement Learning

d428d070622e0f4363fceae11f4a3576-Paper.pdf

1.5M Steps 3.1M Steps RND BeBold 6.4M Steps 4.6M Steps 7.5M Steps 9.8M Steps 1.0M Steps 1.4M Steps 3.4M Steps 2.4M Steps 3.9M Steps 4.8M Steps

A Proof of Theorem