Agents
Large Pre-Trained Models for Bimanual Manipulation in 3D
Yurchyk, Hanna, Chang, Wei-Di, Dudek, Gregory, Meger, David
We investigate the integration of attention maps from a pre-trained Vision Transformer into voxel representations to enhance bimanual robotic manipulation. Specifically, we extract attention maps from DINOv2, a self-supervised ViT model, and interpret them as pixel-level saliency scores over RGB images. These maps are lifted into a 3D voxel grid, resulting in voxel-level semantic cues that are incorporated into a behavior cloning policy. When integrated into a state-of-the-art voxel-based policy, our attention-guided featurization yields an average absolute improvement of 8.2% and a relative gain of 21.9% across all tasks in the RLBench bimanual benchmark.
Survey of Vision-Language-Action Models for Embodied Manipulation
Li, Haoran, Chen, Yuhui, Cui, Wenbo, Liu, Weiheng, Liu, Kai, Zhou, Mingcai, Zhang, Zhengtao, Zhao, Dongbin
Embodied intelligence systems, which enhance agent capabilities through continuous environment interactions, have garnered significant attention from both academia and industry. Vision-Language-Action models, inspired by advancements in large foundation models, serve as universal robotic control frameworks that substantially improve agent-environment interaction capabilities in embodied intelligence systems. This expansion has broadened application scenarios for embodied AI robots. This survey comprehensively reviews VLA models for embodied manipulation. Firstly, it chronicles the developmental trajectory of VLA architectures. Subsequently, we conduct a detailed analysis of current research across 5 critical dimensions: VLA model structures, training datasets, pre-training methods, post-training methods, and model evaluation. Finally, we synthesize key challenges in VLA development and real-world deployment, while outlining promising future research directions.
Higher-Order Responsibility
In ethics, individual responsibility is often defined through Frankfurt's principle of alternative possibilities. This definition is not adequate in a group decision-making setting because it often results in the lack of a responsible party or "responsibility gap''. One of the existing approaches to address this problem is to consider group responsibility. Another, recently proposed, approach is "higher-order'' responsibility. The paper considers the problem of deciding if higher-order responsibility up to degree $d$ is enough to close the responsibility gap. The main technical result is that this problem is $ฮ _{2d+1}$-complete.
Game Theory and Multi-Agent Reinforcement Learning for Zonal Ancillary Markets
Morri, Francesco, Cadre, Hรฉlรจne Le, Gruet, Pierre, Brotcorne, Luce
We characterize zonal ancillary market coupling relying on noncooperative game theory. To that purpose, we formulate the ancillary market as a multi-leader single follower bilevel problem, that we subsequently cast as a generalized Nash game with side constraints and nonconvex feasibility sets. We determine conditions for equilibrium existence and show that the game has a generalized potential game structure. To compute market equilibrium, we rely on two exact approaches: an integrated optimization approach and Gauss-Seidel best-response, that we compare against multi-agent deep reinforcement learning. On real data from Germany and Austria, simulations indicate that multi-agent deep reinforcement learning achieves the smallest convergence rate but requires pretraining, while best-response is the slowest. On the economics side, multi-agent deep reinforcement learning results in smaller market costs compared to the exact methods, but at the cost of higher variability in the profit allocation among stakeholders. Further, stronger coupling between zones tends to reduce costs for larger zones.
Generations in Dialogue: Human-centric AI and collaborative AI systems with Professor Andreea Bobu
Generations in Dialogue: Bridging Perspectives in AI is a podcast from AAAI featuring thought-provoking discussions between AI experts, practitioners, and enthusiasts from different age groups and backgrounds. Each episode delves into how generational experiences shape views on AI, exploring the challenges, opportunities, and ethical considerations that come with the advancement of this transformative technology. In the second episode of this new series from AAAI, host Ella Lan chats to Professor Andreea Bobu about choosing her research direction, working with humans in-the-loop, things to consider when working with data, system design challenges, the gap between what we think we're programming and what we've actually programmed, privacy and personalisation, and advice for early-career researchers interested in human-centric AI. Andreea Bobu is an Assistant Professor at MIT and leads the Collaborative Learning and Autonomy Research (CLEAR) Lab, where she develops autonomous agents that learn to act for, with, and around people. Her research focuses on aligning robot behavior with human expectations by studying how agents can acquire the right supervision--whether from direct human input or priors--build shared task representations with users, and address misalignment from differing human models.
All of My Employees Are AI Agents, and So Are My Executives
Sam Altman says the one-person billion-dollar company is coming. Maybe I could be that person--if only I could get my colleagues to shut up and stop lying. One day a couple months ago, in the middle of lunch, I glanced at my phone and was puzzled to see my colleague Ash Roy calling. In and of itself it might not have seemed strange to get a call from Ash: He's the CTO and chief product officer of HurumoAI, a startup I cofounded last summer. We were in the middle of a big push to get our software product, an AI agent application, into beta. There was plenty to discuss. "Hey there," he said, when I picked up. He was calling, he said, because I'd requested a progress report on the app from Megan. "I've been good," I said, chewing my grilled cheese.
Deep (Predictive) Discounted Counterfactual Regret Minimization
Xu, Hang, Li, Kai, Fu, Haobo, Fu, Qiang, Xing, Junliang, Cheng, Jian
Counterfactual regret minimization (CFR) is a family of algorithms for effectively solving imperfect-information games. To enhance CFR's applicability in large games, researchers use neural networks to approximate its behavior. However, existing methods are mainly based on vanilla CFR and struggle to effectively integrate more advanced CFR variants. In this work, we propose an efficient model-free neural CFR algorithm, overcoming the limitations of existing methods in approximating advanced CFR variants. At each iteration, it collects variance-reduced sampled advantages based on a value network, fits cumulative advantages by bootstrapping, and applies discounting and clipping operations to simulate the update mechanisms of advanced CFR variants. Experimental results show that, compared with model-free neural algorithms, it exhibits faster convergence in typical imperfect-information games and demonstrates stronger adversarial performance in a large poker game.
Distributionally Robust Online Markov Game with Linear Function Approximation
The sim-to-real gap, where agents trained in a simulator face significant performance degradation during testing, is a fundamental challenge in reinforcement learning. Extansive works adopt the framework of distributionally robust RL, to learn a policy that acts robustly under worst case environment shift. Within this framework, our objective is to devise algorithms that are sample efficient with interactive data collection and large state spaces. By assuming d-rectangularity of environment dynamic shift, we identify a fundamental hardness result for learning in online Markov game, and address it by adopting minimum value assumption. Then, a novel least square value iteration type algorithm, DR-CCE-LSI, with exploration bonus devised specifically for multiple agents, is proposed to find an \episilon-approximate robust Coarse Correlated Equilibrium(CCE). To obtain sample efficient learning, we find that: when the feature mapping function satisfies certain properties, our algorithm, DR-CCE-LSI, is able to achieve ฮต-approximate CCE with a regret bound of O{dHmin{H,1/min{ฯ_i}}\sqrt{K}}, where K is the number of interacting episodes, H is the horizon length, d is the feature dimension, and \simga_i represents the uncertainty level of player i. Our work introduces the first sample-efficient algorithm for this setting, matches the best result so far in single agent setting, and achieves minimax optimalsample complexity in terms of the feature dimension d. Meanwhile, we also conduct simulation study to validate the efficacy of our algorithm in learning a robust equilibrium.
Understanding Electro-communication and Electro-sensing in Weakly Electric Fish using Multi-Agent Deep Reinforcement Learning
Singh, Satpreet H., Johnson-Yu, Sonja, Lu, Zhouyang, Walsman, Aaron, Pedraja, Federico, Turcu, Denis, Sharma, Pratyusha, Saphra, Naomi, Sawtell, Nathaniel B., Rajan, Kanaka
Weakly electric fish, like Gnathonemus petersii, use a remarkable electrical modality for active sensing and communication, but studying their rich electrosensing and electrocommunication behavior and associated neural activity in naturalistic settings remains experimentally challenging. Here, we present a novel biologically-inspired computational framework to study these behaviors, where recurrent neural network (RNN) based artificial agents trained via multi-agent reinforcement learning (MARL) learn to modulate their electric organ discharges (EODs) and movement patterns to collectively forage in virtual environments. Trained agents demonstrate several emergent features consistent with real fish collectives, including heavy tailed EOD interval distributions, environmental context dependent shifts in EOD interval distributions, and social interaction patterns like freeloading, where agents reduce their EOD rates while benefiting from neighboring agents' active sensing. A minimal two-fish assay further isolates the role of electro-communication, showing that access to conspecific EODs and relative dominance jointly shape foraging success. Notably, these behaviors emerge through evolution-inspired rewards for individual fitness and emergent inter-agent interactions, rather than through rewarding agents explicitly for social interactions. Our work has broad implications for the neuroethology of weakly electric fish, as well as other social, communicating animals in which extensive recordings from multiple individuals, and thus traditional data-driven modeling, are infeasible.
ProST: Progressive Sub-task Training for Pareto-Optimal Multi-agent Systems Using Small Language Models
Bijoy, Biddut Sarker, Hasan, Mohammad Saqib, Alipoormolabashi, Pegah, Sil, Avirup, Balasubramanian, Aruna, Balasubramanian, Niranjan
Multi-agent systems with smaller language models (SLMs) present a viable alternative to single agent systems powered by large language models (LLMs) for addressing complex problems. In this work, we study how these alternatives compare in terms of both effectiveness and efficiency. To study this trade-off, we instantiate single and multi-agent systems for the complex problems in the AppWorld environment using different sized language models. We find that difficulties with long-trajectory learning in smaller language models (SLMs) limit their performance. Even when trained for specialized roles, SLMs fail to learn all subtasks effectively. To address this issue, we introduce a simple progressive sub-task training strategy, which introduces new sub-tasks progressively in each training epoch. We find that this novel strategy, analogous to instance level curriculum learning, consistently improves the effectiveness of multi-agents at all configurations. Our Pareto analysis shows that fine-tuned multi-agent systems yield better effectiveness-efficiency trade-offs. Additional ablations and analyses shows the importance of our progressive training strategy and its ability to reduce subtask error rates.