Goto

Collaborating Authors

 Agents


MobileGUI-RL: Advancing Mobile GUI Agent through Reinforcement Learning in Online Environment

arXiv.org Artificial Intelligence

Recently, there has been a surge of vision-based GUI agents designed to automate everyday mobile and web tasks. These agents interpret raw GUI screenshots and autonomously decide where to click, scroll, or type, which bypasses handcrafted rules and app-specific APIs. However, most existing methods trained GUI agent in the offline environment using pre-collected trajectories. This approach limits scalability, causes overfitting to specific UI templates, and leads to brittle policies when faced with unseen environment. We present MobileGUI-RL, a scalable framework that trains GUI agent in online environment. MobileGUI-RL contains two key components. It (i) synthesizes a curriculum of learnable tasks through self-exploration and filtering, and (ii) adapts GRPO to GUI navigation with trajectory-aware advantages and composite rewards that balance task success and execution efficiency. Experiments on three online mobile-agent benchmarks show consistent gains, validating the effectiveness of our approach.


LLMs are Introvert

arXiv.org Artificial Intelligence

The exponential growth of social media and generative AI has transformed information dissemination, fostering connectivity but also accelerating the spread of misinformation. Understanding information propagation dynamics and developing effective control strategies is essential to mitigate harmful content. Traditional models, such as SIR, provide basic insights but inadequately capture the complexities of online interactions. Advanced methods, including attention mechanisms and graph neural networks, enhance accuracy but typically overlook user psychology and behavioral dynamics. Large language models (LLMs), with their human-like reasoning, offer new potential for simulating psychological aspects of information spread. We introduce an LLM-based simulation environment capturing agents' evolving attitudes, emotions, and responses. Initial experiments, however, revealed significant gaps between LLM-generated behaviors and authentic human dynamics, especially in stance detection and psychological realism. A detailed evaluation through Social Information Processing Theory identified major discrepancies in goal-setting and feedback evaluation, stemming from the lack of emotional processing in standard LLM training. To address these issues, we propose the Social Information Processing-based Chain of Thought (SIP-CoT) mechanism enhanced by emotion-guided memory. This method improves the interpretation of social cues, personalization of goals, and evaluation of feedback. Experimental results confirm that SIP-CoT-enhanced LLM agents more effectively process social information, demonstrating behaviors, attitudes, and emotions closer to real human interactions. In summary, this research highlights critical limitations in current LLM-based propagation simulations and demonstrates how integrating SIP-CoT and emotional memory significantly enhances the social intelligence and realism of LLM agents.


Going Beyond Heuristics by Imposing Policy Improvement as a Constraint

arXiv.org Artificial Intelligence

In many reinforcement learning (RL) applications, augmenting the task rewards with heuristic rewards that encode human priors about how a task should be solved is crucial for achieving desirable performance. However, because such heuristics are usually not optimal, much human effort and computational resources are wasted in carefully balancing tasks and heuristic rewards. Theoretically rigorous ways of incorporating heuristics rely on the idea of \textit{policy invariance}, which guarantees that the performance of a policy obtained by maximizing heuristic rewards is the same as the optimal policy with respect to the task reward. However, in practice, policy invariance doesn't result in policy improvement, and such methods are known to empirically perform poorly. We propose a new paradigm to mitigate reward hacking and effectively use heuristics based on the practical goal of maximizing policy improvement instead of policy improvement. Our framework, Heuristic Enhanced Policy Optimization (HEPO), effectively leverages heuristics while avoiding the pitfall of prior methods for mitigating reward hacking. HEPO achieves superior performance on standard benchmarks with well-engineered reward functions. More surprisingly, HEPO allows policy optimization to achieve good performance even when heuristics are not well-engineered and designed by non-expert humans, showcasing HEPO's ability to reduce human effort in reward design. % HEPO is a plug-and-play optimization method for leveraging heuristics in reinforcement learning. Code is available at https://github.com/Improbable-AI/hepo.


MOD-X: A Modular Open Decentralized eXchange Framework proposal for Heterogeneous Interoperable Artificial Intelligence Agents

arXiv.org Artificial Intelligence

As Artificial Intelligence systems evolve from monolithic models to ecosystems of specialized agents, the need for standardized communication protocols becomes increasingly critical. This paper introduces MOD-X (Modular Open Decentralized eXchange), a novel architectural framework proposal for agent interoperability that addresses key limitations of existing protocols. Unlike current approaches, MOD-X proposes a layered architecture with a Universal Message Bus, thorough state management, translation capabilities, and blockchain-based security mechanisms. We present MOD-X's architecture, compare it with existing protocols, and demonstrate its application through a worked example how it enables integration between heterogeneous specialist agents (agents with different architectures, vendors, capabilities, and knowledge representations--including rule-based systems, neural networks, symbolic reasoning engines, and legacy software with agent wrappers). MOD-X's key innovations include a publish-subscribe communication model, semantic capability discovery, and dynamic workflow orchestration--providing a framework that bridges theoretical formalism with practical implementation. This architecture addresses the growing need for truly decentralized, interoperable agent ecosystems that can scale effectively without the need for central coordination.


'Flashes of brilliance and frustration': I let an AI agent run my day

New Scientist

Stone, who is the founder and director of the Learning Agents Research Group at his university, has spent decades thinking about the possibility of AI agents. They are, he says, systems that "sense the environment, decide what to do and take an action". Put in those terms, it may feel as if AI agents have been with us for years. For instance, IBM's Deep Blue computer appeared to have reacted to events on a real-world chessboard to beat former World Chess Champion Garry Kasparov in 1997. But Deep Blue wasn't an agentic AI, says Stone. "It was decision-making, but it wasn't sensing or acting," he says.


MalVol-25: A Diverse, Labelled and Detailed Volatile Memory Dataset for Malware Detection and Response Testing and Validation

arXiv.org Artificial Intelligence

This paper addresses the critical need for high-quality malware datasets that support advanced analysis techniques, particularly machine learning and agentic AI frameworks. Existing datasets often lack diversity, comprehensive labelling, and the complexity necessary for effective machine learning and agent-based AI training. To fill this gap, we developed a systematic approach for generating a dataset that combines automated malware execution in controlled virtual environments with dynamic monitoring tools. The resulting dataset comprises clean and infected memory snapshots across multiple malware families and operating systems, capturing detailed behavioural and environmental features. Key design decisions include applying ethical and legal compliance, thorough validation using both automated and manual methods, and comprehensive documentation to ensure replicability and integrity. The dataset's distinctive features enable modelling system states and transitions, facilitating RL-based malware detection and response strategies. This resource is significant for advancing adaptive cybersecurity defences and digital forensic research. Its scope supports diverse malware scenarios and offers potential for broader applications in incident response and automated threat mitigation.


Participatory Evolution of Artificial Life Systems via Semantic Feedback

arXiv.org Artificial Intelligence

We present a semantic feedback framework that enables natural language to guide the evolution of artificial life systems. Integrating a prompt-to-parameter encoder, a CMA-ES optimizer, and CLIP-based evaluation, the system allows user intent to modulate both visual outcomes and underlying behavioral rules. Implemented in an interactive ecosystem simulation, the framework supports prompt refinement, multi-agent interaction, and emergent rule synthesis. User studies show improved semantic alignment over manual tuning and demonstrate the system's potential as a platform for participatory generative design and open-ended evolution.


Last-Iterate Convergence of No-Regret Learning for Equilibria in Bargaining Games

arXiv.org Artificial Intelligence

Bargaining games, where agents attempt to agree on how to split utility, are an important class of games used to study economic behavior, which motivates a study of online learning algorithms in these games. In this work, we tackle when no-regret learning algorithms converge to Nash equilibria in bargaining games. Recent results have shown that online algorithms related to Follow the Regularized Leader (FTRL) converge to Nash equilibria (NE) in the last iterate in a wide variety of games, including zero-sum games. However, bargaining games do not have the properties used previously to established convergence guarantees, even in the simplest case of the ultimatum game, which features a single take-it-or-leave-it offer. Nonetheless, we establish that FTRL (without the modifications necessary for zero-sum games) achieves last-iterate convergence to an approximate NE in the ultimatum game along with a bound on convergence time under mild assumptions. Further, we provide experimental results to demonstrate that convergence to NE, including NE with asymmetric payoffs, occurs under a broad range of initial conditions, both in the ultimatum game and in bargaining games with multiple rounds. This work demonstrates how complex economic behavior (e.g. learning to use threats and the existence of many possible equilibrium outcomes) can result from using a simple learning algorithm, and that FTRL can converge to equilibria in a more diverse set of games than previously known.


Personalised Explanations in Long-term Human-Robot Interactions

arXiv.org Artificial Intelligence

In the field of Human-Robot Interaction (HRI), a fundamental challenge is to facilitate human understanding of robots. The emerging domain of eXplainable HRI (XHRI) investigates methods to generate explanations and evaluate their impact on human-robot interactions. Previous works have highlighted the need to personalise the level of detail of these explanations to enhance usability and comprehension. Our paper presents a framework designed to update and retrieve user knowledge-memory models, allowing for adapting the explanations' level of detail while referencing previously acquired concepts. Three architectures based on our proposed framework that use Large Language Models (LLMs) are evaluated in two distinct scenarios: a hospital patrolling robot and a kitchen assistant robot. Experimental results demonstrate that a two-stage architecture, which first generates an explanation and then personalises it, is the framework architecture that effectively reduces the level of detail only when there is related user knowledge.


Less is More: Empowering GUI Agent with Context-Aware Simplification

arXiv.org Artificial Intelligence

The research focus of GUI agents is shifting from text-dependent to pure-vision-based approaches, which, though promising, prioritize comprehensive pre-training data collection while neglecting contextual modeling challenges. W e probe the characteristics of element and history contextual modeling in GUI agent and summarize: 1) the high-density and loose-relation of element context highlight the existence of many unrelated elements and their negative influence; 2) the high redundancy of history context reveals the inefficient history modeling in current GUI agents. In this work, we propose a context-aware simplification framework for building an efficient and effective GUI Agent, termed SimpAgent. T o mitigate potential interference from numerous unrelated elements, we introduce a masking-based element pruning method that circumvents the intractable relation modeling through an efficient masking mechanism. T o reduce the redundancy in historical information, we devise a consistency-guided history compression module, which enhances implicit LLM-based compression through innovative explicit guidance, achieving an optimal balance between performance and efficiency. With the above components, SimpAgent reduces 27% FLOPs and achieves superior GUI navigation performances. Comprehensive navigation experiments across diverse web and mobile environments demonstrate the effectiveness and potential of our agent.