AITopics

Recent advances in LLM agents have largely built on reasoning backbones like ReAct, which interleave thought and action in complex environments. However, ReAct often produces ungrounded or incoherent reasoning steps, leading to misalignment between the agent's actual state and goal. Our analysis finds that this stems from ReAct's inability to maintain consistent internal beliefs and goal alignment, causing compounding errors and hallucinations. To address this, we introduce ReflAct, a novel backbone that shifts reasoning from merely planning next actions to continuously reflecting on the agent's state relative to its goal. By explicitly grounding decisions in states and enforcing ongoing goal alignment, ReflAct dramatically improves strategic reliability. This design delivers substantial empirical gains: ReflAct surpasses ReAct by 27.7% on average, achieving a 93.3% success rate in ALFWorld. Notably, ReflAct even outperforms ReAct with added enhancement modules (e.g., Reflexion, WKM), showing that strengthening the core reasoning backbone is key to reliable agent performance.

large language model, machine learning, natural language, (21 more...)

2505.15182

Genre: Research Report > New Finding (0.67)

Industry:

Law (0.94)
Materials > Metals & Mining (0.46)
Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
(2 more...)

Vid2World: Crafting Video Diffusion Models to Interactive World Models

Huang, Siqiao, Wu, Jialong, Zhou, Qixing, Miao, Shangchen, Long, Mingsheng

World models, which predict future transitions from past observation and action sequences, have shown great promise for improving data efficiency in sequential decision-making. However, existing world models often require extensive domain-specific training and still produce low-fidelity, coarse predictions, limiting their usefulness in complex environments. In contrast, video diffusion models trained on large-scale internet data have demonstrated impressive capabilities in generating high-quality videos that capture diverse real-world dynamics. In this work, we present Vid2World, a general approach for leveraging and transferring pre-trained video diffusion models into interactive world models. To bridge the gap, Vid2World systematically explores video diffusion causalization, reshaping both the architecture and training objective of pre-trained models to enable autoregressive generation. Additionally, it incorporates a causal action guidance mechanism to enhance action controllability in the resulting interactive world models. Extensive experiments across multiple domains, including robot manipulation, 3D game simulation, and open-world navigation, demonstrate that our method offers a scalable and effective pathway for repurposing highly capable video diffusion models into interactive world models.

diffusion model, large language model, machine learning, (18 more...)

2505.14357

Country: Europe > Austria (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
(2 more...)

Jeloka, Bhavini, Guan, Yue, Tsiotras, Panagiotis

Learning Large-Scale Competitive Team Behaviors with Mean-Field Interactions and Online Opponent Modeling

While multi-agent reinforcement learning (MARL) has been proven effective across both collaborative and competitive tasks, existing algorithms often struggle to scale to large populations of agents. Recent advancements in mean-field (MF) theory provide scalable solutions by approximating population interactions as a continuum, yet most existing frameworks focus exclusively on either fully cooperative or purely competitive settings. To bridge this gap, we introduce MF-MAPPO, a mean-field extension of PPO designed for zero-sum team games that integrate intra-team cooperation with inter-team competition. MF-MAPPO employs a shared actor and a minimally informed critic per team and is trained directly on finite-population simulators, thereby enabling deployment to realistic scenarios with thousands of agents. We further show that MF-MAPPO naturally extends to partially observable settings through a simple gradient-regularized training scheme. Our evaluation utilizes large-scale benchmark scenarios using our own testing simulation platform for MF team games (MFEnv), including offense-defense battlefield tasks as well as variants of population-based rock-paper-scissors games that admit analytical solutions, for benchmarking. Across these benchmarks, MF-MAPPO outperforms existing methods and exhibits complex, heterogeneous behaviors, demonstrating the effectiveness of combining mean-field theory and MARL techniques at scale.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

2504.21164

Country: Asia (0.27)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Pfitzer, Nicolas, Sebastián, Eduardo, Shankar, Ajay, Prorok, Amanda

Prompting Robot Teams with Natural Language

This paper presents a framework towards prompting multi-robot teams with high-level tasks using natural language expressions. Our objective is to use the reasoning capabilities demonstrated by recent language models in understanding and decomposing human expressions of intent, and repurpose these for multi-robot collaboration and decision-making. The key challenge is that an individual's behavior in a collective can be hard to specify and interpret, and must continuously adapt to actions from others. This necessitates a framework that possesses the representational capacity required by the logic and semantics of a task, and yet supports decentralized and interactive real-time operation. We solve this dilemma by recognizing that a task can be represented as a deterministic finite automaton (DFA), and that recurrent neural networks (RNNs) can encode numerous automata. This allows us to distill the logic and sequential decompositions of sub-tasks obtained from a language model into an RNN, and align its internal states with the semantics of a given task. By training a graph neural network (GNN) control policy that is conditioned on the hidden states of the RNN and the language embeddings, our method enables robots to execute task-relevant actions in a decentralized manner. We present evaluations of this single light-weight interpretable model on various simulated and real-world multi-robot tasks that require sequential and collaborative behavior by the team -- sites.google.com/view/prompting-teams.

arxiv preprint arxiv, machine learning, reinforcement learning, (18 more...)

2509.24575

Country: Europe > Switzerland (0.28)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.61)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
(4 more...)

Neural Information Processing SystemsSep-29-2025, 03:37:49 GMT

f53437debdd397c42929d929614bc705-Paper-Conference.pdf

constraint, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.28)
Asia > China > Guangdong Province (0.14)

Genre:

Research Report (0.66)
Instructional Material > Course Syllabus & Notes (0.46)

Industry:

Energy > Power Industry (1.00)
Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.54)

Neural Information Processing SystemsSep-29-2025, 00:47:39 GMT

dcc337bb2a4d25afefd9ab800721debb-Paper-Conference.pdf

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe (0.14)
North America > United States > California (0.14)

Industry: Energy > Oil & Gas (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.40)

Ceccon, Marina, Fabris, Alessandro, Radanović, Goran, Biega, Asia J., Susto, Gian Antonio

Reinforcement Learning for Durable Algorithmic Recourse

arXiv.org Artificial IntelligenceSep-29-2025

Algorithmic recourse seeks to provide individuals with actionable recommendations that increase their chances of receiving favorable outcomes from automated decision systems (e.g., loan approvals). While prior research has emphasized robustness to model updates, considerably less attention has been given to the temporal dynamics of recourse--particularly in competitive, resource-constrained settings where recommendations shape future applicant pools. In this work, we present a novel time-aware framework for algorithmic recourse, explicitly modeling how candidate populations adapt in response to recommendations. Additionally, we introduce a novel reinforcement learning (RL)-based recourse algorithm that captures the evolving dynamics of the environment to generate recommendations that are both feasible and valid. We design our recommendations to be durable, supporting validity over a predefined time horizon T. This durability allows individuals to confidently reapply after taking time to implement the suggested changes. Through extensive experiments in complex simulation environments, we show that our approach substantially outperforms existing baselines, offering a superior balance between feasibility and long-term validity. Together, these results underscore the importance of incorporating temporal and behavioral dynamics into the design of practical recourse systems.

data mining, machine learning, reinforcement learning, (19 more...)

2509.22102

Country:

North America > United States (0.05)
Europe > Switzerland (0.04)
Asia (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

arXiv.org Machine LearningSep-29-2025

Error Analysis of Discrete Flow with Generator Matching

Wan, Zhengyan, Ouyang, Yidong, Yao, Qiang, Xie, Liyan, Fang, Fang, Zha, Hongyuan, Cheng, Guang

Discrete diffusion models have achieved significant progress in large language models [24, 42, 41, 39]. By learning the time reversal of the noising process of a continuous-time Markov chain (CTMC), the models transform a simple distribution (e.g., uniform [19, 23] and masked [26, 32, 30]) that is easy to sample to the data distribution that has discrete structures. Discrete flow models [10, 16, 31] provides a flexible framework for learning generating transition rate analogous to continuous flow matching [1, 22, 21], offering a more comprehensive family of probability paths. Recent theoretical analysis for discrete diffusion models has emerged through numerous studies [11, 40, 28, 29]. To obtain the transition rate in the reversed process, the concrete scores in these analyses are obtained by minimizing the concrete score entropy introduced in [23, 8]. In those works, the distribution errors of discrete diffusion models are divided into three parts: (a) truncation error from truncating the time horizon in the noising process; (b) concrete score estimation error; (c) discretization error from sampling algorithms. In our paper, we aim to investigate the theoretical properties of the discrete flow-based models using the generator matching training objective [18] and the uniformization sampling algorithm [11], which offers zero truncation error and discretization error.

diffusion model, international conference, transition rate, (14 more...)

arXiv.org Machine Learning

2509.21906

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

arXiv.org Artificial IntelligenceSep-29-2025

Discovering and Analyzing Stochastic Processes to Reduce Waste in Food Retail

Kalenkova, Anna, Xia, Lu, Neumann, Dirk

This paper proposes a novel method for analyzing food retail processes with a focus on reducing food waste. The approach integrates object-centric process mining (OCPM) with stochastic process discovery and analysis. First, a stochastic process in the form of a continuous-time Markov chain is discovered from grocery store sales data. This model is then extended with supply activities. Finally, a what-if analysis is conducted to evaluate how the quantity of products in the store evolves over time. This enables the identification of an optimal balance between customer purchasing behavior and supply strategies, helping to prevent both food waste due to oversupply and product shortages.

artificial intelligence, continuous-time markov chain, machine learning, (17 more...)

2509.21322

Country:

Europe > Germany (0.28)
Oceania > Australia (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry:

Food & Agriculture > Agriculture (0.91)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.53)

arXiv.org Artificial IntelligenceSep-29-2025

Constructive Conflict-Driven Multi-Agent Reinforcement Learning for Strategic Diversity

Mai, Yuxiang, Yin, Qiyue, Ni, Wancheng, Xu, Pei, Huang, Kaiqi

In recent years, diversity has emerged as a useful mechanism to enhance the efficiency of multi-agent reinforcement learning (MARL). However, existing methods predominantly focus on designing policies based on individual agent characteristics, often neglecting the interplay and mutual influence among agents during policy formation. To address this gap, we propose Competitive Diversity through Constructive Conflict (CoDiCon), a novel approach that incorporates competitive incentives into cooperative scenarios to encourage policy exchange and foster strategic diversity among agents. Drawing inspiration from sociological research, which highlights the benefits of moderate competition and constructive conflict in group decision-making, we design an intrinsic reward mechanism using ranking features to introduce competitive motivations. A centralized intrinsic reward module generates and distributes varying reward values to agents, ensuring an effective balance between competition and cooperation. By optimizing the pa-rameterized centralized reward module to maximize environmental rewards, we reformulate the constrained bilevel optimization problem to align with the original task objectives. We evaluate our algorithm against state-of-the-art methods in the SMAC and GRF environments. Experimental results demonstrate that CoDiCon achieves superior performance, with competitive intrinsic rewards effectively promoting diverse and adaptive strategies among cooperative agents.

intrinsic reward, machine learning, reinforcement learning, (14 more...)

2509.14276

Genre:

Research Report > Promising Solution (0.54)
Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Sports > Soccer (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)