AITopics | babyai

Collaborating Authors

babyai

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Teachable Reinforcement Learningvia Advice Distillation

Neural Information Processing SystemsFeb-8-2026, 05:55:02 GMT

Colorsdesignatesupervision used: shadesofblue = highleveladvice; red = lowleveladvice; black = oracledemonstrations; gray = shaped rewards. Figure 6: "Bestadvice" is OffsetAdvice.

artificial intelligence, machine learning, urlhttp, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Netherlands > North Brabant > Eindhoven (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

DEPO: Dual-Efficiency Preference Optimization for LLM Agents

Chen, Sirui, Zhao, Mengshi, Xu, Lei, Zhao, Yuying, Zhu, Beier, Zhang, Hanwang, Zhao, Shengjie, Lu, Chaochao

arXiv.org Artificial IntelligenceNov-20-2025

Recent advances in large language models (LLMs) have greatly improved their reasoning and decision-making abilities when deployed as agents. Richer reasoning, however, often comes at the cost of longer chain of thought (CoT), hampering interaction efficiency in real-world scenarios. Nevertheless, there still lacks systematic definition of LLM agent efficiency, hindering targeted improvements. To this end, we introduce dual-efficiency, comprising (i) step-level efficiency, which minimizes tokens per step, and (ii) trajectory-level efficiency, which minimizes the number of steps to complete a task. Building on this definition, we propose DEPO, a dual-efficiency preference optimization method that jointly rewards succinct responses and fewer action steps. Experiments on WebShop and BabyAI show that DEPO cuts token usage by up to 60.9% and steps by up to 26.9%, while achieving up to a 29.3% improvement in performance. DEPO also generalizes to three out-of-domain math benchmarks and retains its efficiency gains when trained on only 25% of the data. Our project page is at https://opencausalab.github.io/DEPO.

efficiency, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2511.15392

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Adapting Interleaved Encoders with PPO for Language-Guided Reinforcement Learning in BabyAI

Mathur, Aryan, Ahmed, Asaduddin

arXiv.org Artificial IntelligenceOct-28-2025

Deep reinforcement learning agents often struggle when tasks require understanding both vision and language. Conventional architectures typically isolate perception (for example, CNN-based visual encoders) from decision-making (policy networks). This separation can be inefficient, since the policy's failures do not directly help the perception module learn what is important. To address this, we implement the Perception-Decision Interleaving Transformer (PDiT) architecture introduced by Mao et al. (2023), a model that alternates between perception and decision layers within a single transformer. This interleaving allows feedback from decision-making to refine perceptual features dynamically. In addition, we integrate a contrastive loss inspired by CLIP to align textual mission embeddings with visual scene features. We evaluate the PDiT encoders on the BabyAI GoToLocal environment and find that the approach achieves more stable rewards and stronger alignment compared to a standard PPO baseline. The results suggest that interleaved transformer encoders are a promising direction for developing more integrated autonomous agents.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2510.23148

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

FOSSIL: Harnessing Feedback on Suboptimal Samples for Data-Efficient Generalisation with Imitation Learning for Embodied Vision-and-Language Tasks

McCallum, Sabrina, Parekh, Amit, Suglia, Alessandro

arXiv.org Artificial IntelligenceOct-14-2025

Current approaches to embodied AI tend to learn policies from expert demonstrations. However, without a mechanism to evaluate the quality of demonstrated actions, they are limited to learning from optimal behaviour, or they risk replicating errors and inefficiencies. While reinforcement learning offers one alternative, the associated exploration typically results in sacrificing data efficiency. This work explores how agents trained with imitation learning can learn robust representations from both optimal and suboptimal demonstrations when given access to constructive language feedback as a means to contextualise different modes of behaviour. We directly provide language feedback embeddings as part of the input sequence into a Transformer-based policy, and optionally complement the traditional next action prediction objective with auxiliary self-supervised learning objectives for feedback prediction. We test our approach on a range of embodied Vision-and-Language tasks in our custom BabyAI-XGen environment and show significant improvements in agents' compositional generalisation abilities and robustness, suggesting that our data-efficient method allows models to successfully convert suboptimal behaviour into learning opportunities. Overall, our results suggest that language feedback is a competitive and intuitive alternative to intermediate scalar rewards for language-specified embodied tasks.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.11307

Genre: Research Report > New Finding (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

OmniReflect: Discovering Transferable Constitutions for LLM agents via Neuro-Symbolic Reflections

Bharadwaj, Manasa, Verma, Nikhil, Ferreira, Kevin

arXiv.org Artificial IntelligenceJun-24-2025

Efforts to improve Large Language Model (LLM) agent performance on complex tasks have largely focused on fine-tuning and iterative self-correction. However, these approaches often lack generalizable mechanisms for longterm learning and remain inefficient in dynamic environments. We introduce OmniReflect, a hierarchical, reflection-driven framework that constructs a constitution, a compact set of guiding principles distilled from task experiences, to enhance the effectiveness and efficiency of an LLM agent. OmniReflect operates in two modes: Self-sustaining, where a single agent periodically curates its own reflections during task execution, and Co-operative, where a Meta-advisor derives a constitution from a small calibration set to guide another agent. To construct these constitutional principles, we employ Neural, Symbolic, and NeuroSymbolic techniques, offering a balance between contextual adaptability and computational efficiency. Empirical results averaged across models show major improvements in task success, with absolute gains of +10.3% on ALFWorld, +23.8% on BabyAI, and +8.3% on PDDL in the Self-sustaining mode. Similar gains are seen in the Co-operative mode, where a lightweight Qwen3-4B ReAct agent outperforms all Reflexion baselines on BabyAI. These findings highlight the robustness and effectiveness of OmniReflect across environments and backbones.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.17449

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving Long-Horizon Imitation Through Instruction Prediction

Hejna, Joey, Abbeel, Pieter, Pinto, Lerrel

arXiv.org Artificial IntelligenceJun-21-2023

Complex, long-horizon planning and its combinatorial nature pose steep challenges for learning-based agents. Difficulties in such settings are exacerbated in low data regimes where over-fitting stifles generalization and compounding errors hurt accuracy. In this work, we explore the use of an often unused source of auxiliary supervision: language. Inspired by recent advances in transformer-based models, we train agents with an instruction prediction loss that encourages learning temporally extended representations that operate at a high level of abstraction. Concretely, we demonstrate that instruction modeling significantly improves performance in planning environments when training with a limited number of demonstrations on the BabyAI and Crafter benchmarks. In further analysis we find that instruction modeling is most important for tasks that require complex reasoning, while understandably offering smaller gains in environments that require simple plans. More details and code can be found at https://github.com/jhejna/instruction-prediction.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2306.12554

Country:

North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry:

Education (0.93)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Pre-Trained Language Models for Interactive Decision-Making

Li, Shuang, Puig, Xavier, Paxton, Chris, Du, Yilun, Wang, Clinton, Fan, Linxi, Chen, Tao, Huang, De-An, Akyürek, Ekin, Anandkumar, Anima, Andreas, Jacob, Mordatch, Igor, Torralba, Antonio, Zhu, Yuke

arXiv.org Artificial IntelligenceOct-29-2022

Language model (LM) pre-training is useful in many language processing tasks. But can pre-trained LMs be further leveraged for more general machine learning problems? We propose an approach for using LMs to scaffold learning and generalization in general sequential decision-making problems. In this approach, goals and observations are represented as a sequence of embeddings, and a policy network initialized with a pre-trained LM predicts the next action. We demonstrate that this framework enables effective combinatorial generalization across different environments and supervisory modalities. We begin by assuming access to a set of expert demonstrations, and show that initializing policies with LMs and fine-tuning them via behavior cloning improves task completion rates by 43.6% in the VirtualHome environment. Next, we integrate an active data gathering procedure in which agents iteratively interact with the environment, relabel past "failed" experiences with new goals, and update their policies in a self-supervised loop. Active data gathering further improves combinatorial generalization, outperforming the best baseline by 25.1%. Finally, we explain these results by investigating three possible factors underlying the effectiveness of the LM-based policy. We find that sequential input representations (vs. fixed-dimensional feature vectors) and LM-based weight initialization are both important for generalization. Surprisingly, however, the format of the policy inputs encoding (e.g. as a natural language string vs. an arbitrary sequential encoding) has little influence. Together, these results suggest that language modeling induces representations that are useful for modeling not just language, but also goals and plans; these representations can aid learning and generalization even outside of language processing.

large language model, machine learning, virtualhome, (17 more...)

arXiv.org Artificial Intelligence

2202.01771

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Education (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Language may help AI navigate new environments

#artificialintelligenceApr-18-2020, 17:04:21 GMT

In a new study published this week on the preprint server Arxiv.org, Both it and several baseline models will soon be available on GitHub. One of the most powerful techniques in machine learning -- reinforcement learning, which entails spurring software agents toward goals via rewards -- is also one of the most flawed. It's sample inefficient, meaning it requires a large number of compute cycles to complete, and without additional data to cover variations, it adapts poorly to environments that differ from the training environment. It's theorized that prior knowledge of tasks through structured language could be combined with reinforcement learning to mitigate its shortcomings, and BabyAI was designed to put this theory to the test.

agent, babyai, help ai navigate new environment, (5 more...)

#artificialintelligence

AI-Alerts: 2020 > 2020-04 > AAAI AI-Alert for Apr 21, 2020 (1.00)

Country: North America > Canada > Ontario > Toronto (0.17)

Genre: Research Report (0.73)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)

Add feedback

BabyAI++: Towards Grounded-Language Learning beyond Memorization

Cao, Tianshi, Wang, Jingkang, Zhang, Yining, Manivasagam, Sivabalan

arXiv.org Artificial IntelligenceApr-15-2020

Despite success in many real-world tasks (e.g., robotics), reinforcement learning (RL) agents still learn from tabula rasa when facing new and dynamic scenarios. By contrast, humans can offload this burden through textual descriptions. Although recent works have shown the benefits of instructive texts in goal-conditioned RL, few have studied whether descriptive texts help agents to generalize across dynamic environments. To promote research in this direction, we introduce a new platform, BabyAI++, to generate various dynamic environments along with corresponding descriptive texts. Moreover, we benchmark several baselines inherited from the instruction following setting and develop a novel approach towards visually-grounded language learning on our platform. Extensive experiments show strong evidence that using descriptive texts improves the generalization of RL agents across environments with varied dynamics.

agent, descriptive text, image text, (17 more...)

arXiv.org Artificial Intelligence

2004.072

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.60)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.40)

Add feedback

Automated curriculum generation for Policy Gradients from Demonstrations

Srinivasan, Anirudh, Bahdanau, Dzmitry, Chevalier-Boisvert, Maxime, Bengio, Yoshua

arXiv.org Artificial IntelligenceDec-1-2019

In this paper, we present a technique that improves the process of training an agent (using RL) for instruction following. We develop a training curriculum that uses a nominal number of expert demonstrations and trains the agent in a manner that draws parallels from one of the ways in which humans learn to perform complex tasks, i.e by starting from the goal and working backwards. We test our method on the BabyAI platform and show an improvement in sample efficiency for some of its tasks compared to a PPO (proximal policy optimization) baseline.

agent, curriculum, demonstration, (14 more...)

arXiv.org Artificial Intelligence

1912.00444

Country:

North America > Canada > Quebec > Montreal (0.05)
North America > United States (0.04)

Genre:

Research Report (0.82)
Instructional Material > Course Syllabus & Notes (0.34)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)
Information Technology > Artificial Intelligence > Robots (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.47)

Add feedback