Goto

Collaborating Authors

 Agents


Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills

arXiv.org Artificial Intelligence

Recent efforts to leverage the Multi-modal Large Language Model (MLLM) as GUI agents have yielded promising outcomes. However, these agents still struggle with long-horizon tasks in online environments, primarily due to insufficient knowledge and the inherent gap between offline and online domains. In this paper, inspired by how humans generalize knowledge in open-ended environments, we propose a Hierarchical Multimodal Skills (HMS) module to tackle the issue of insufficient knowledge. It progressively abstracts trajectories into execution skills, core skills, and ultimately meta-skills, providing a hierarchical knowledge structure for long-horizon task planning. To bridge the domain gap, we propose the Skill-Augmented Monte Carlo Tree Search (SA-MCTS) algorithm, which efficiently leverages skills acquired in offline environments to reduce the action search space during online tree exploration. Building on HMS, we propose Mirage-1, a multimodal, cross-platform, plug-and-play GUI agent. To validate the performance of Mirage-1 in real-world long-horizon scenarios, we constructed a new benchmark, AndroidLH. Experimental results show that Mirage-1 outperforms previous agents by 32\%, 19\%, 15\%, and 79\% on AndroidWorld, MobileMiniWob++, Mind2Web-Live, and AndroidLH, respectively. Project page: https://cybertronagent.github.io/Mirage-1.github.io/


MasHost Builds It All: Autonomous Multi-Agent System Directed by Reinforcement Learning

arXiv.org Artificial Intelligence

Large Language Model (LLM)-driven Multi-agent systems (Mas) have recently emerged as a powerful paradigm for tackling complex real-world tasks. However, existing Mas construction methods typically rely on manually crafted interaction mechanisms or heuristic rules, introducing human biases and constraining the autonomous ability. Even with recent advances in adaptive Mas construction, existing systems largely remain within the paradigm of semi-autonomous patterns. In this work, we propose MasHost, a Reinforcement Learning (RL)-based framework for autonomous and query-adaptive Mas design. By formulating Mas construction as a graph search problem, our proposed MasHost jointly samples agent roles and their interactions through a unified probabilistic sampling mechanism. Beyond the accuracy and efficiency objectives pursued in prior works, we introduce component rationality as an additional and novel design principle in Mas. To achieve this multi-objective optimization, we propose Hierarchical Relative Policy Optimization (HRPO), a novel RL strategy that collaboratively integrates group-relative advantages and action-wise rewards. To our knowledge, our proposed MasHost is the first RL-driven framework for autonomous Mas graph construction. Extensive experiments on six benchmarks demonstrate that MasHost consistently outperforms most competitive baselines, validating its effectiveness, efficiency, and structure rationality.


AI Agent Behavioral Science

arXiv.org Artificial Intelligence

Recent advances in large language models (LLMs) have enabled the development of AI agents that exhibit increasingly human-like behaviors, including planning, adaptation, and social dynamics across diverse, interactive, and open-ended scenarios. These behaviors are not solely the product of the internal architectures of the underlying models, but emerge from their integration into agentic systems operating within specific contexts, where environmental factors, social cues, and interaction feedbacks shape behavior over time. This evolution necessitates a new scientific perspective: AI Agent Behavioral Science. Rather than focusing only on internal mechanisms, this perspective emphasizes the systematic observation of behavior, design of interventions to test hypotheses, and theory-guided interpretation of how AI agents act, adapt, and interact over time. We systematize a growing body of research across individual agent, multi-agent, and human-agent interaction settings, and further demonstrate how this perspective informs responsible AI by treating fairness, safety, interpretability, accountability, and privacy as behavioral properties. By unifying recent findings and laying out future directions, we position AI Agent Behavioral Science as a necessary complement to traditional model-centric approaches, providing essential tools for understanding, evaluating, and governing the real-world behavior of increasingly autonomous AI systems.


Unpacking AI Agents

WIRED

In the past six months, OpenAI, Anthropic, Google, and others have released web-browsing agents that are designed to complete tasks independently, with only minimal input from humans. OpenAI CEO Sam Altman has even called AI agents "the next giant breakthrough." On today's episode, we'll dive into what makes these agents different from other forms of machine intelligence and whether their capabilities can live up to the hype. Write to us at uncannyvalley@wired.com. You can always listen to this week's podcast through the audio player on this page, but if you want to subscribe for free to get every episode, here's how: If you're on an iPhone or iPad, open the app called Podcasts, or just tap this link.


The Download: AI agents' autonomy, and sodium-based batteries

MIT Technology Review

Next slide, please: A brief history of the corporate presentation PowerPoint is everywhere. It's used in religious sermons; by schoolchildren preparing book reports; at funerals and weddings. In 2010, Microsoft announced that PowerPoint was installed on more than a billion computers worldwide. But before PowerPoint, 35-millimeter film slides were king. They were the only medium for the kinds of high-impact presentations given by CEOs and top brass at annual meetings for stockholders, employees, and salespeople.


AI Agents Are Too Cheap for Our Own Good

WIRED

In 2007, Luke Arrigoni, an AI entrepreneur, earned 63,000 at his first job as a junior software developer. Today, he says AI tools that write better code than he did back then cost just 120 annually. The numbers don't sit right with him. Arrigoni, who runs Loti AI, a company that helps Hollywood stars find unauthorized deepfakes, worries that underpriced AI tools encourage companies to eliminate entry-level roles. He wants to flip the incentive structure so people's careers don't end before they begin.


The Sample Complexity of Online Strategic Decision Making with Information Asymmetry and Knowledge Transportability

arXiv.org Machine Learning

Information asymmetry is a pervasive feature of multi-agent systems, especially evident in economics and social sciences. In these settings, agents tailor their actions based on private information to maximize their rewards. These strategic behaviors often introduce complexities due to confounding variables. Simultaneously, knowledge transportability poses another significant challenge, arising from the difficulties of conducting experiments in target environments. It requires transferring knowledge from environments where empirical data is more readily available. Against these backdrops, this paper explores a fundamental question in online learning: Can we employ non-i.i.d. actions to learn about confounders even when requiring knowledge transfer? We present a sample-efficient algorithm designed to accurately identify system dynamics under information asymmetry and to navigate the challenges of knowledge transfer effectively in reinforcement learning, framed within an online strategic interaction model. Our method provably achieves learning of an $ฮต$-optimal policy with a tight sample complexity of $O(1/ฮต^2)$.


Reasoning as a Resource: Optimizing Fast and Slow Thinking in Code Generation Models

arXiv.org Artificial Intelligence

This position paper proposes a fundamental shift in designing code generation models: treating reasoning depth as a controllable resource. Rather than being an incidental byproduct of prompting, we argue that the trade-off between rapid, direct answers ("fast thinking") and elaborate, chain-of-thought deliberation ("slow thinking") must be explicitly managed. We contend that optimizing reasoning budgets across the entire model lifecycle - from synthetic data creation and benchmarking to real-world deploymen - can unlock superior trade-offs among accuracy, latency, and cost. This paper outlines how adaptive control over reasoning can enrich supervision signals, motivate new multi-dimensional benchmarks, and inform cost-aware, security-conscious deployment policies. By viewing fast and slow thinking as complementary modes to be scheduled, we envision coding agents that think deep when necessary and act fast when possible.


Delegations as Adaptive Representation Patterns: Rethinking Influence in Liquid Democracy

arXiv.org Artificial Intelligence

Liquid democracy is a mechanism for the division of labor in decision-making through the transitive delegation of influence. In essence, all individuals possess the autonomy to determine the issues with which they will engage directly, while for other matters, they may appoint a representative of their choosing. So far, the literature has studied the delegation structures emerging in liquid democracy as static. As a result, transitivity defined as the capacity to transfer acquired authority to another entity, has been identified as a concern as it would be conducive to unrestrained accumulation of power. Focusing on the implementation of liquid democracy supported by the LiquidFeedback software, we propose a novel approach to assessing the influence of voting nodes in a transitive delegation graph, taking into account the process nature of real-world liquid democracy in which delegation and voting are distinct and increasingly independent activities. By introducing a novel model of delegations in liquid democracy, we show how transitivity may in fact contribute to an effective regulation of deliberation influence and decision-making power. While maintaining the one-person, one-vote paradigm for all votes cast, the anticipated influence of an agent, to the extent it is stemming from transitivity, experiences a precipitous decline following an exponential trajectory. In general, it is our objective to move the first steps towards a rigorous analysis of liquid democracy as an adaptive democratic representation process. The adaptivity aspect of liquid democracy has not yet been explored within the existing academic literature despite it being, we believe, one of its most important features. We therefore also outline a research agenda focusing on this aspect of liquid democracy.


Generalization Error Analysis for Attack-Free and Byzantine-Resilient Decentralized Learning with Data Heterogeneity

arXiv.org Artificial Intelligence

--Decentralized learning, which facilitates joint model training across geographically scattered agents, has gained significant attention in the field of signal and information processing in recent years. While the optimization errors of decentralized learning algorithms have been extensively studied, their generalization errors remain relatively under-explored. As the generalization errors reflect the scalability of trained models on unseen data and are crucial in determining the performance of trained models in real-world applications, understanding the generalization errors of decentralized learning is of paramount importance. In this paper, we present fine-grained generalization error analysis for both attack-free and Byzantine-resilient decentralized learning with heterogeneous data as well as under mild assumptions, in contrast to prior studies that consider homogeneous data and/or rely on a stringent bounded stochastic gradient assumption. Our results shed light on the impact of data heterogeneity, model initialization and stochastic gradient noise - factors that have not been closely investigated before - on the generalization error of decentralized learning. We also reveal that Byzantine attacks performed by malicious agents largely affect the generalization error, and their negative impact is inherently linked to the data heterogeneity while remaining independent on the sample size. Numerical experiments on both convex and non-convex tasks are conducted to validate our theoretical findings. ECENT years have witnessed the significant advance of distributed learning, which enables geographically scattered devices to collaboratively train models, while ensuring the privacy of local data. According to the underlying network topologies, distributed learning can be classified into two categories, federated learning and decentralized learning. Federated learning relies on a central server to coordinate the learning process [2]-[8], while decentralized learning is able to operate autonomously without the need for a central server [9]-[18]. Notably, decentralized learning has gained increasing attention for its capacity to circumvent the communication bottleneck inherent in federated learning, caused by the central server.