Agents
Philosophy-informed Machine Learning
A deep dive into the open literature shows that there are t hree fundamental limitations to current ML approaches, namely blackbox brittleness (which renders models uninterpretable and unreliable under distribution shift [2]), causal blindness (which conflates correlation with causation [3]), and alignment failures (which produce systems optimizing objectives misaligned with human values [4]) . These deficiencies stem from a profound philosophical poverty in how ML conceptualizes knowledge, reasoning, and values. The first fundamental limitation, b lackbox brittleness, manifests when trained models fail on seemingly trivial variations of their training distribution. For example, a vision model that accurately identifies stop signs under normal conditions might misclassify them entirely when small adversarial perturbations are applied [5] . Not surprisingly, t h e same brittleness extends beyond adversarial examples to everyday distribution shifts (e.g., natural language processing models exhibit performance degradation when processing text from different cultural contexts, etc.) [6] .
LIMI: Less is More for Agency
Xiao, Yang, Jiang, Mohan, Sun, Jie, Li, Keyu, Lin, Jifan, Zhuang, Yumin, Zeng, Ji, Xia, Shijie, Hua, Qishuo, Li, Xuefeng, Cai, Xiaojie, Wang, Tongyu, Zhang, Yue, Liu, Liming, Wu, Xia, Hou, Jinlong, Cheng, Yuan, Li, Wenjie, Wang, Xiang, Wang, Dequan, Liu, Pengfei
We define Agency as the emergent capacity of AI systems to function as autonomous agents actively discovering problems, formulating hypotheses, and executing solutions through self-directed engagement with environments and tools. This fundamental capability marks the dawn of the Age of AI Agency, driven by a critical industry shift: the urgent need for AI systems that don't just think, but work. While current AI excels at reasoning and generating responses, industries demand autonomous agents that can execute tasks, operate tools, and drive real-world outcomes. As agentic intelligence becomes the defining characteristic separating cognitive systems from productive workers, efficiently cultivating machine autonomy becomes paramount. Current approaches assume that more data yields better agency, following traditional scaling laws from language modeling. We fundamentally challenge this paradigm. LIMI (Less Is More for Intelligent Agency) demonstrates that agency follows radically different development principles. Through strategic focus on collaborative software development and scientific research workflows, we show that sophisticated agentic intelligence can emerge from minimal but strategically curated demonstrations of autonomous behavior. Using only 78 carefully designed training samples, LIMI achieves 73.5% on comprehensive agency benchmarks, dramatically outperforming state-of-the-art models: Kimi-K2-Instruct (24.1%), DeepSeek-V3.1 (11.9%), Qwen3-235B-A22B-Instruct (27.5%), and GLM-4.5 (45.1%). Most strikingly, LIMI demonstrates 53.7% improvement over models trained on 10,000 samples-achieving superior agentic intelligence with 128 times fewer samples. Our findings establish the Agency Efficiency Principle: machine autonomy emerges not from data abundance but from strategic curation of high-quality agentic demonstrations.
Joint Channel Estimation and Computation Offloading in Fluid Antenna-assisted MEC Networks
Ju, Ying, Li, Mingdong, Wang, Haoyu, Liu, Lei, Qu, Youyang, Dong, Mianxiong, Leung, Victor C. M., Yuen, Chau
With the emergence of fluid antenna (FA) in wireless communications, the capability to dynamically adjust port positions offers substantial benefits in spatial diversity and spectrum efficiency, which are particularly valuable for mobile edge computing (MEC) systems. Therefore, we propose an FA-assisted MEC offloading framework to minimize system delay. This framework faces two severe challenges, which are the complexity of channel estimation due to dynamic port configuration and the inherent non-convexity of the joint optimization problem. Firstly, we propose Information Bottleneck Metric-enhanced Channel Compressed Sensing (IBM-CCS), which advances FA channel estimation by integrating information relevance into the sensing process and capturing key features of FA channels effectively. Secondly, to address the non-convex and high-dimensional optimization problem in FA-assisted MEC systems, which includes FA port selection, beamforming, power control, and resource allocation, we propose a game theory-assisted Hierarchical Twin-Dueling Multi-agent Algorithm (HiTDMA) based offloading scheme, where the hierarchical structure effectively decouples and coordinates the optimization tasks between the user side and the base station side. Crucially, the game theory effectively reduces the dimensionality of power control variables, allowing deep reinforcement learning (DRL) agents to achieve improved optimization efficiency. Numerical results confirm that the proposed scheme significantly reduces system delay and enhances offloading performance, outperforming benchmarks. Additionally, the IBM-CCS channel estimation demonstrates superior accuracy and robustness under varying port densities, contributing to efficient communication under imperfect CSI.
Learning from Observation: A Survey of Recent Advances
Burnwal, Returaj, Mehta, Hriday, Bhatt, Nirav Pravinbhai, Ravindran, Balaraman
Imitation Learning (IL) algorithms offer an efficient way to train an agent by mimicking an expert's behavior without requiring a reward function. IL algorithms often necessitate access to state and action information from expert demonstrations. Although expert actions can provide detailed guidance, requiring such action information may prove impractical for real-world applications where expert actions are difficult to obtain. To address this limitation, the concept of learning from observation (LfO) or state-only imitation learning (SOIL) has recently gained attention, wherein the imitator only has access to expert state visitation information. In this paper, we present a framework for LfO and use it to survey and classify existing LfO methods in terms of their trajectory construction, assumptions and algorithm's design choices. This survey also draws connections between several related fields like offline RL, model-based RL and hierarchical RL. Finally, we use our framework to identify open problems and suggest future research directions.
Adaptive Event-Triggered Policy Gradient for Multi-Agent Reinforcement Learning
Siddique, Umer, Sinha, Abhinav, Cao, Yongcan
Conventional multi-agent reinforcement learning (MARL) methods rely on time-triggered execution, where agents sample and communicate actions at fixed intervals. This approach is often computationally expensive and communication-intensive. To address this limitation, we propose ET-MAPG (Event-Triggered Multi-Agent Policy Gradient reinforcement learning), a framework that jointly learns an agent's control policy and its event-triggering policy. Unlike prior work that decouples these mechanisms, ET-MAPG integrates them into a unified learning process, enabling agents to learn not only what action to take but also when to execute it. For scenarios with inter-agent communication, we introduce AET-MAPG, an attention-based variant that leverages a self-attention mechanism to learn selective communication patterns. AET-MAPG empowers agents to determine not only when to trigger an action but also with whom to communicate and what information to exchange, thereby optimizing coordination. Both methods can be integrated with any policy gradient MARL algorithm. Extensive experiments across diverse MARL benchmarks demonstrate that our approaches achieve performance comparable to state-of-the-art, time-triggered baselines while significantly reducing both computational load and communication overhead.
Automated Multi-Agent Workflows for RTL Design
Bhattaram, Amulya, Ramamoorthy, Janani, Gupta, Ranit, Marculescu, Diana, Stamoulis, Dimitrios
The rise of agentic AI workflows unlocks novel opportunities for computer systems design and optimization. However, for specialized domains such as program synthesis, the relative scarcity of HDL and proprietary EDA resources online compared to more common programming tasks introduces challenges, often necessitating task-specific fine-tuning, high inference costs, and manually-crafted agent orchestration. In this work, we present VeriMaAS, a multi-agent framework designed to automatically compose agentic workflows for RTL code generation. Our key insight is to integrate formal verification feedback from HDL tools directly into workflow generation, reducing the cost of gradient-based updates or prolonged reasoning traces. Our method improves synthesis performance by 5-7% for pass@k over fine-tuned baselines, while requiring only a few hundred training examples, representing an order-of-magnitude reduction in supervision cost.
Choose Your Battles: Distributed Learning Over Multiple Tug of War Games
Chandak, Siddharth, Bistritz, Ilai, Bambos, Nicholas
Consider N players and K games taking place simultaneously. Each of these games is modeled as a Tug-of-War (ToW) game where increasing the action of one player decreases the reward for all other players. Each player participates in only one game at any given time. At each time step, a player decides the game in which they wish to participate in and the action they take in that game. Their reward depends on the actions of all players that are in the same game. This system of K games is termed `Meta Tug-of-War' (Meta-ToW) game. These games can model scenarios such as power control, distributed task allocation, and activation in sensor networks. We propose the Meta Tug-of-Peace algorithm, a distributed algorithm where the action updates are done using a simple stochastic approximation algorithm, and the decision to switch games is made using an infrequent 1-bit communication between the players. We prove that in Meta-ToW games, our algorithm converges to an equilibrium that satisfies a target Quality of Service reward vector for the players. We then demonstrate the efficacy of our algorithm through simulations for the scenarios mentioned above.
Steerable Adversarial Scenario Generation through Test-Time Preference Alignment
Nie, Tong, Mei, Yuewen, Tang, Yihong, He, Junlin, Sun, Jie, Shi, Haotian, Ma, Wei, Sun, Jian
Adversarial scenario generation is a cost-effective approach for safety assessment of autonomous driving systems. However, existing methods are often constrained to a single, fixed trade-off between competing objectives such as adversariality and realism. This yields behavior-specific models that cannot be steered at inference time, lacking the efficiency and flexibility to generate tailored scenarios for diverse training and testing requirements. In view of this, we reframe the task of adversarial scenario generation as a multi-objective preference alignment problem and introduce a new framework named \textbf{S}teerable \textbf{A}dversarial scenario \textbf{GE}nerator (SAGE). SAGE enables fine-grained test-time control over the trade-off between adversariality and realism without any retraining. We first propose hierarchical group-based preference optimization, a data-efficient offline alignment method that learns to balance competing objectives by decoupling hard feasibility constraints from soft preferences. Instead of training a fixed model, SAGE fine-tunes two experts on opposing preferences and constructs a continuous spectrum of policies at inference time by linearly interpolating their weights. We provide theoretical justification for this framework through the lens of linear mode connectivity. Extensive experiments demonstrate that SAGE not only generates scenarios with a superior balance of adversariality and realism but also enables more effective closed-loop training of driving policies. Project page: https://tongnie.github.io/SAGE/.