Agents
CORE: Reducing UI Exposure in Mobile Agents via Collaboration Between Cloud and Local LLMs
Fan, Gucongcong, Niu, Chaoyue, Lyu, Chengfei, Wu, Fan, Chen, Guihai
Mobile agents rely on Large Language Models (LLMs) to plan and execute tasks on smartphone user interfaces (UIs). While cloud-based LLMs achieve high task accuracy, they require uploading the full UI state at every step, exposing unnecessary and often irrelevant information. In contrast, local LLMs avoid UI uploads but suffer from limited capacity, resulting in lower task success rates. We propose $\textbf{CORE}$, a $\textbf{CO}$llaborative framework that combines the strengths of cloud and local LLMs to $\textbf{R}$educe UI $\textbf{E}$xposure, while maintaining task accuracy for mobile agents. CORE comprises three key components: (1) $\textbf{Layout-aware block partitioning}$, which groups semantically related UI elements based on the XML screen hierarchy; (2) $\textbf{Co-planning}$, where local and cloud LLMs collaboratively identify the current sub-task; and (3) $\textbf{Co-decision-making}$, where the local LLM ranks relevant UI blocks, and the cloud LLM selects specific UI elements within the top-ranked block. CORE further introduces a multi-round accumulation mechanism to mitigate local misjudgment or limited context. Experiments across diverse mobile apps and tasks show that CORE reduces UI exposure by up to 55.6% while maintaining task success rates slightly below cloud-only agents, effectively mitigating unnecessary privacy exposure to the cloud. The code is available at https://github.com/Entropy-Fighter/CORE.
Adaptive Minds: Empowering Agents with LoRA-as-Tools
Shekar, Pavan C, Krishnan, Ashwanth
We present Adaptive Minds, an agentic system that treats LoRA adapters as domain-specific tools. Instead of relying on a single fine-tuned model or rigid rule-based routing, our approach empowers the base LLM itself to act as a semantic router analyzing each query and dynamically selecting the most relevant LoRA tool. This enables the agent to seamlessly switch between different domain experts on demand. By combining the flexibility of multi-agent orchestration with the efficiency of parameter-efficient fine-tuning, Adaptive Minds delivers accurate, specialized responses while preserving conversational ability. The system is built with LangGraph for workflow management, supports both API and web interfaces, and is fully open source, providing a scalable and extensible foundation for domain-adaptive AI assistance.
Corrigibility Transformation: Constructing Goals That Accept Updates
For an AI's training process to successfully impart a desired goal, it is important that the AI does not attempt to resist the training. However, partially learned goals will often incentivize an AI to avoid further goal updates, as most goals are better achieved by an AI continuing to pursue them. We say that a goal is corrigible if it does not incentivize taking actions that avoid proper goal updates or shutdown. In addition to convergence in training, corrigibility also allows for correcting mistakes and changes in human preferences, which makes it a crucial safety property. Despite this, the existing literature does not include specifications for goals that are both corrigible and competitive with non-corrigible alternatives. We provide a formal definition for corrigibility, then introduce a transformation that constructs a corrigible version of any goal that can be made corrigible, without sacrificing performance. This is done by myopically eliciting predictions of reward conditional on costlessly preventing updates, which then also determine the reward when updates are accepted. The transformation can be modified to recursively extend corrigibility to any new agents created by corrigible agents, and to prevent agents from deliberately modifying their goals. Two gridworld experiments demonstrate that these corrigible goals can be learned effectively, and that they lead to the desired behavior.
HOB: A Holistically Optimized Bidding Strategy under Heterogeneous Auction Mechanisms with Organic Traffic
Li, Qi, Huang, Wendong, Ye, Qichen, Xu, Wutong, Wang, Cheems, Bai, Rongquan, Yuan, Wei, Wang, Guan, Yu, Chuan, Xu, Jian
The E-commerce advertising platforms typically sell commercial traffic through either second-price auction (SPA) or first-price auction (FPA). SPA was historically prevalent due to its dominant strategy incentive-compatible (DSIC) for bidders with quasi-linear utilities, especially when budgets are not a binding constraint, while FPA has gained more prominence for offering higher revenue potential to publishers and avoiding the possibility for discriminatory treatment in personalized reserve prices. Meanwhile, on the demand side, advertisers are increasingly adopting platform-wide marketing solutions akin to QuanZhanTui, shifting from spending budgets solely on commercial traffic to bidding on the entire traffic for the purpose of maximizing overall sales. For automated bidding systems, such a trend poses a critical challenge: determining optimal strategies across heterogeneous auction channels to fulfill diverse advertiser objectives, such as maximizing return (MaxReturn) or meeting target return on ad spend (TargetROAS). To overcome this challenge, this work makes two key contributions. First, we derive an efficient solution for optimal bidding under FPA channels, which takes into account the presence of organic traffic - traffic can be won for free. Second, we introduce a marginal cost alignment (MCA) strategy that provably secures bidding efficiency across heterogeneous auction mechanisms. To validate performance of our developed framework, we conduct comprehensive offline experiments on public datasets and large-scale online A/B testing, which demonstrate consistent improvements over existing methods.
RM-RL: Role-Model Reinforcement Learning for Precise Robot Manipulation
Chen, Xiangyu, Zhou, Chuhao, Liu, Yuxi, Yang, Jianfei
Precise robot manipulation is critical for fine-grained applications such as chemical and biological experiments, where even small errors (e.g., reagent spillage) can invalidate an entire task. Existing approaches often rely on pre-collected expert demonstrations and train policies via imitation learning (IL) or offline reinforcement learning (RL). However, obtaining high-quality demonstrations for precision tasks is difficult and time-consuming, while offline RL commonly suffers from distribution shifts and low data efficiency. We introduce a Role-Model Reinforcement Learning (RM-RL) framework that unifies online and offline training in real-world environments. The key idea is a role-model strategy that automatically generates labels for online training data using approximately optimal actions, eliminating the need for human demonstrations. RM-RL reformulates policy learning as supervised training, reducing instability from distribution mismatch and improving efficiency. A hybrid training scheme further leverages online role-model data for offline reuse, enhancing data efficiency through repeated sampling. Extensive experiments show that RM-RL converges faster and more stably than existing RL methods, yielding significant gains in real-world manipulation: 53% improvement in translation accuracy and 20% in rotation accuracy. Finally, we demonstrate the successful execution of a challenging task, precisely placing a cell plate onto a shelf, highlighting the framework's effectiveness where prior methods fail.
MAGPIE: A benchmark for Multi-AGent contextual PrIvacy Evaluation
Juneja, Gurusha, Pasupulati, Jayanth Naga Sai, Albalak, Alon, Hua, Wenyue, Wang, William Yang
A core challenge for autonomous LLM agents in collaborative settings is balancing robust privacy understanding and preservation alongside task efficacy. Existing privacy benchmarks only focus on simplistic, single-turn interactions where private information can be trivially omitted without affecting task outcomes. In this paper, we introduce MAGPIE (Multi-AGent contextual PrIvacy Evaluation), a novel benchmark of 200 high-stakes tasks designed to evaluate privacy understanding and preservation in multi-agent collaborative, non-adversarial scenarios. MAGPIE integrates private information as essential for task resolution, forcing agents to balance effective collaboration with strategic information control. Our evaluation reveals that state-of-the-art agents, including GPT-5 and Gemini 2.5-Pro, exhibit significant privacy leakage, with Gemini 2.5-Pro leaking up to 50.7% and GPT-5 up to 35.1% of the sensitive information even when explicitly instructed not to. Moreover, these agents struggle to achieve consensus or task completion and often resort to undesirable behaviors such as manipulation and power-seeking (e.g., Gemini 2.5-Pro demonstrating manipulation in 38.2% of the cases). These findings underscore that current LLM agents lack robust privacy understanding and are not yet adequately aligned to simultaneously preserve privacy and maintain effective collaboration in complex environments.
CausalVerse: Benchmarking Causal Representation Learning with Configurable High-Fidelity Simulations
Chen, Guangyi, Deng, Yunlong, Zhu, Peiyuan, Li, Yan, Shen, Yifan, Li, Zijian, Zhang, Kun
Causal Representation Learning (CRL) aims to uncover the data-generating process and identify the underlying causal variables and relations, whose evaluation remains inherently challenging due to the requirement of known ground-truth causal variables and causal structure. Existing evaluations often rely on either simplistic synthetic datasets or downstream performance on real-world tasks, generally suffering a dilemma between realism and evaluative precision. In this paper, we introduce a new benchmark for CRL using high-fidelity simulated visual data that retains both realistic visual complexity and, more importantly, access to ground-truth causal generating processes. The dataset comprises around 200 thousand images and 3 million video frames across 24 sub-scenes in four domains: static image generation, dynamic physical simulations, robotic manipulations, and traffic situation analysis. These scenarios range from static to dynamic settings, simple to complex structures, and single to multi-agent interactions, offering a comprehensive testbed that hopefully bridges the gap between rigorous evaluation and real-world applicability. In addition, we provide flexible access to the underlying causal structures, allowing users to modify or configure them to align with the required assumptions in CRL, such as available domain labels, temporal dependencies, or intervention histories. Leveraging this benchmark, we evaluated representative CRL methods across diverse paradigms and offered empirical insights to assist practitioners and newcomers in choosing or extending appropriate CRL frameworks to properly address specific types of real problems that can benefit from the CRL perspective. Welcome to visit our: Project page:https://causal-verse.github.io/, Dataset:https://huggingface.co/CausalVerse.
When AI Gets Persuaded, Humans Follow: Inducing the Conformity Effect in Persuasive Dialogue
Sasaki, Rikuo, Inaba, Michimasa
Recent advancements in AI have highlighted its application in captology, the field of using computers as persuasive technologies. We hypothesized that the "conformity effect," where individuals align with others' actions, also occurs with AI agents. This study verifies this hypothesis by introducing a "Persuadee Agent" that is persuaded alongside a human participant in a three-party persuasive dialogue with a Persuader Agent. We conducted a text-based dialogue experiment with human participants. We compared four conditions manipulating the Persuadee Agent's behavior (persuasion acceptance vs. non-acceptance) and the presence of an icebreaker session. Results showed that when the Persuadee Agent accepted persuasion, both perceived persuasiveness and actual attitude change significantly improved. Attitude change was greatest when an icebreaker was also used, whereas an unpersuaded AI agent suppressed attitude change. Additionally, it was confirmed that the persuasion acceptance of participants increased at the moment the Persuadee Agent was persuaded. These results suggest that appropriately designing a Persuadee Agent can improve persuasion through the conformity effect.
AppCopilot: Toward General, Accurate, Long-Horizon, and Efficient Mobile Agent
Fan, Jingru, Dang, Yufan, Wu, Jingyao, Li, Huatao, Yang, Runde, Yang, Xiyuan, Wang, Yuheng, Qian, Chen
With the raid evolution of large language models and multimodal models, the mobile-agent landscape has proliferated without converging on the fundamental challenges. This paper identifies four core problems that should be solved for mobile agents to deliver practical, scalable impact: (1) generalization across tasks, APPs, and devices; (2) accuracy, specifically precise on-screen interaction and click targeting; (3) long-horizon capability for sustained, multi-step goals; and (4) efficiency, specifically high-performance runtime on resource-constrained devices. We present AppCopilot, a multimodal, multi-agent, general-purpose mobile agent that operates across applications. AppCopilot operationalizes this position through an end-to-end pipeline spanning data collection, training, finetuning, efficient inference, and PC/mobile application. At the model layer, it integrates multimodal foundation models with robust Chinese-English support. At the reasoning and control layer, it combines chain-of-thought reasoning, hierarchical task planning and decomposition, and multi-agent collaboration. At the execution layer, it enables experiential adaptation, voice interaction, function calling, cross-APP and cross-device orchestration, and comprehensive mobile APP support. The system design incorporates profiling-driven optimization for latency and memory across heterogeneous hardware. Empirically, AppCopilot achieves significant improvements on four dimensions: stronger generalization, higher precision of on screen actions, more reliable long horizon task completion, and faster, more resource efficient runtime. By articulating a cohesive position and a reference architecture that closes the loop from data collection, training to finetuning and efficient inference, this paper offers a concrete roadmap for general purpose mobile agent and provides actionable guidance.
Foresighted Online Policy Optimization with Interference
Xiang, Liner, Wang, Jiayi, Cai, Hengrui
Contextual bandits, which leverage the baseline features of sequentially arriving individuals to optimize cumulative rewards while balancing exploration and exploitation, are critical for online decision-making. Existing approaches typically assume no interference, where each individual's action affects only their own reward. Yet, such an assumption can be violated in many practical scenarios, and the oversight of interference can lead to short-sighted policies that focus solely on maximizing the immediate outcomes for individuals, which further results in suboptimal decisions and potentially increased regret over time. To address this significant gap, we introduce the foresighted online policy with interference (FRONT) that innovatively considers the long-term impact of the current decision on subsequent decisions and rewards. The proposed FRONT method employs a sequence of exploratory and exploitative strategies to manage the intricacies of interference, ensuring robust parameter inference and regret minimization. Theoretically, we establish a tail bound for the online estimator and derive the asymptotic distribution of the parameters of interest under suitable conditions on the interference network. We further show that FRONT attains sublinear regret under two distinct definitions, capturing both the immediate and consequential impacts of decisions, and we establish these results with and without statistical inference. The effectiveness of FRONT is further demonstrated through extensive simulations and a real-world application to urban hotel profits.