Agents
Hide-and-Shill: A Reinforcement Learning Framework for Market Manipulation Detection in Symphony-a Decentralized Multi-Agent System
Shi, Ronghua, Liu, Yiou, Ying, Xinyu, Tan, Yang, Feng, Yuchun, Ai, Lynn, Shi, Bill, Wang, Xuhui, Liu, Zhuang
Decentralized finance (DeFi) has ushered in a new era of permissionless financial innovation--but also opened the door to discourse-driven market manipulation at unprecedented scale. Without centralized gatekeepers or regulatory oversight, malicious actors now coordinate shilling campaigns and pump-and-dump schemes across social platforms and on-chain ecosystems. We propose Hide-and-Shill, a novel Multi-Agent Reinforcement Learning (MARL) framework for decentralized manipulation detection. By modeling the interaction between manipulators and detectors as a dynamic adversarial game, the framework learns to identify suspicious discourse patterns using delayed token price reactions as ground-truth financial signals. Our method introduces three key innovations: (1) Group Relative Policy Optimization (GRPO) to improve learning stability in sparse-reward and partially observable settings; (2) a theory-grounded reward function inspired by rational expectations and information asymmetry, distinguishing price discovery from manipulation-induced noise; and (3) a multi-modal agent pipeline that fuses LLM-based semantic features, social graph signals, and on-chain market data for informed decision-making. T o support scalable and trustless deployment, our framework is integrated within the Symphony system--a decentralized multi-agent coordination architecture that enables peer-to-peer agent execution, trust-aware learning through distributed logs, and chain-verifiable evaluation. Symphony facilitates adversarial co-evolution among strategic actors and maintains robust manipulation detection without reliance on centralized oracles, empowering real-time surveillance across global DeFi discourse ecosystems. Trained on 100,000 real-world discourse episodes and validated in adversarial co-evolution simulations, Hide-and-Shill achieves state-of-the-art performance in both detection accuracy and causal attribution.
ParaEQsA: Parallel and Asynchronous Embodied Questions Scheduling and Answering
This paper formulates the Embodied Questions Answering (EQsA) problem, introduces a corresponding benchmark, and proposes a system to tackle the problem. Classical Embodied Question Answering (EQA) is typically formulated as answering one single question by actively exploring a 3D environment. Real deployments, however, often demand handling multiple questions that may arrive asynchronously and carry different urgencies. We formalize this setting as Embodied Questions Answering (EQsA) and present ParaEQsA, a framework for parallel, urgency-aware scheduling and answering. ParaEQsA leverages a group memory module shared among questions to reduce redundant exploration, and a priority-planning module to dynamically schedule questions. To evaluate this setting, we contribute the Parallel Asynchronous Embodied Questions (PAEQs) benchmark containing 40 indoor scenes and five questions per scene (200 in total), featuring asynchronous follow-up questions and urgency labels. We further propose metrics for EQsA performance: Direct Answer Rate (DAR), and Normalized Urgency-Weighted Latency (NUWL), which jointly measure efficiency and responsiveness of this system. ParaEQsA consistently outperforms strong sequential baselines adapted from recent EQA systems, while reducing exploration and delay. Empirical evaluations investigate the relative contributions of priority, urgency modeling, spatial scope, reward estimation, and dependency reasoning within our framework. Together, these results demonstrate that urgency-aware, parallel scheduling is key to making embodied agents responsive and efficient under realistic, multi-question workloads.
AMLNet: A Knowledge-Based Multi-Agent Framework to Generate and Detect Realistic Money Laundering Transactions
Huda, Sabin, Foo, Ernest, Jadidi, Zahra, Newton, MA Hakim, Sattar, Abdul
Anti-money laundering (AML) research is constrained by the lack of publicly shareable, regulation-aligned transaction datasets. We present AMLNet, a knowledge-based multi-agent framework with two coordinated units: a regulation-aware transaction generator and an ensemble detection pipeline. The generator produces 1,090,173 synthetic transactions (approximately 0.16\% laundering-positive) spanning core laundering phases (placement, layering, integration) and advanced typologies (e.g., structuring, adaptive threshold behavior). Regulatory alignment reaches 75\% based on AUSTRAC rule coverage (Section 4.2), while a composite technical fidelity score of 0.75 summarizes temporal, structural, and behavioral realism components (Section 4.4). The detection ensemble achieves F1 0.90 (precision 0.84, recall 0.97) on the internal test partitions of AMLNet and adapts to the external SynthAML dataset, indicating architectural generalizability across different synthetic generation paradigms. We provide multi-dimensional evaluation (regulatory, temporal, network, behavioral) and release the dataset (Version 1.0, https://doi.org/10.5281/zenodo.16736515), to advance reproducible and regulation-conscious AML experimentation.
MedicalOS: An LLM Agent based Operating System for Digital Healthcare
Decades' advances in digital health technologies, such as electronic health records, have largely streamlined routine clinical processes. Yet, most these systems are still hard to learn and use: Clinicians often face the burden of managing multiple tools, repeating manual actions for each patient, navigating complicated UI trees to locate functions, and spending significant time on administration instead of caring for patients. The recent rise of large language model (LLM) based agents demonstrates exceptional capability in coding and computer operation, revealing the potential for humans to interact with operating systems and software not by direct manipulation, but by instructing agents through natural language. This shift highlights the need for an abstraction layer, an agent-computer interface, that translates human language into machine-executable commands. In digital healthcare, however, requires a more domain-specific abstractions that strictly follow trusted clinical guidelines and procedural standards to ensure safety, transparency, and compliance. To address this need, we present \textbf{MedicalOS}, a unified agent-based operational system designed as such a domain-specific abstract layer for healthcare. It translates human instructions into pre-defined digital healthcare commands, such as patient inquiry, history retrieval, exam management, report generation, referrals, treatment planning, that we wrapped as off-the-shelf tools using machine languages (e.g., Python, APIs, MCP, Linux). We empirically validate MedicalOS on 214 patient cases across 22 specialties, demonstrating high diagnostic accuracy and confidence, clinically sound examination requests, and consistent generation of structured reports and medication recommendations. These results highlight MedicalOS as a trustworthy and scalable foundation for advancing workflow automation in clinical practice.
Securing AI Agents: Implementing Role-Based Access Control for Industrial Applications
The emergence of Large Language Models (LLMs) has significantly advanced solutions across various domains, from political science to software development. However, these models are constrained by their training data, which is static and limited to information available up to a specific date. Additionally, their generalized nature often necessitates fine-tuning -- whether for classification or instructional purposes -- to effectively perform specific downstream tasks. AI agents, leveraging LLMs as their core, mitigate some of these limitations by accessing external tools and real-time data, enabling applications such as live weather reporting and data analysis. In industrial settings, AI agents are transforming operations by enhancing decision-making, predictive maintenance, and process optimization. For example, in manufacturing, AI agents enable near-autonomous systems that boost productivity and support real-time decision-making. Despite these advancements, AI agents remain vulnerable to security threats, including prompt injection attacks, which pose significant risks to their integrity and reliability. To address these challenges, this paper proposes a framework for integrating Role-Based Access Control (RBAC) into AI agents, providing a robust security guardrail. This framework aims to support the effective and scalable deployment of AI agents, with a focus on on-premises implementations.
Detecting Model Drifts in Non-Stationary Environment Using Edit Operation Measures
Lee, Chang-Hwan, Shim, Alexander
Reinforcement learning (RL) agents typically assume stationary environment dynamics. Yet in real-world applications such as healthcare, robotics, and finance, transition probabilities or reward functions may evolve, leading to model drift. This paper proposes a novel framework to detect such drifts by analyzing the distributional changes in sequences of agent behavior. Specifically, we introduce a suite of edit operation-based measures to quantify deviations between state-action trajectories generated under stationary and perturbed conditions. Our experiments demonstrate that these measures can effectively distinguish drifted from non-drifted scenarios, even under varying levels of noise, providing a practical tool for drift detection in non-stationary RL environments.
Online Omniprediction with Long-Term Constraints
Bechavod, Yahav, Lu, Jiuyao, Roth, Aaron
We introduce and study the problem of online omniprediction with long-term constraints. At each round, a forecaster is tasked with generating predictions for an underlying (adaptively, adversarially chosen) state that are broadcast to a collection of downstream agents, who must each choose an action. Each of the downstream agents has both a utility function mapping actions and state to utilities, and a vector-valued constraint function mapping actions and states to vector-valued costs. The utility and constraint functions can arbitrarily differ across downstream agents. Their goal is to choose actions that guarantee themselves no regret while simultaneously guaranteeing that they do not cumulatively violate the constraints across time. We show how to make a single set of predictions so that each of the downstream agents can guarantee this by acting as a simple function of the predictions, guaranteeing each of them $\tilde{O}(\sqrt{T})$ regret and $O(1)$ cumulative constraint violation. We also show how to extend our guarantees to arbitrary intersecting contextually defined \emph{subsequences}, guaranteeing each agent both regret and constraint violation bounds not just marginally, but simultaneously on each subsequence, against a benchmark set of actions simultaneously tailored to each subsequence.
On the Escaping Efficiency of Distributed Adversarial Training Algorithms
Cao, Ying, Yuan, Kun, Sayed, Ali H.
Adversarial training has been widely studied in recent years due to its role in improving model robustness against adversarial attacks. This paper focuses on comparing different distributed adversarial training algorithms--including centralized and decentralized strategies--within multi-agent learning environments. Previous studies have highlighted the importance of model flatness in determining robustness. To this end, we develop a general theoretical framework to study the escaping efficiency of these algorithms from local minima, which is closely related to the flatness of the resulting models. We show that when the perturbation bound is sufficiently small (i.e., when the attack strength is relatively mild) and a large batch size is used, decentralized adversarial training algorithms--including consensus and diffusion--are guaranteed to escape faster from local minima than the centralized strategy, thereby favoring flatter minima. However, as the perturbation bound increases, this trend may no longer hold. In the simulation results, we illustrate our theoretical findings and systematically compare the performance of models obtained through decentralized and centralized adversarial training algorithms. The results highlight the potential of decentralized strategies to enhance the robustness of models in distributed settings.
VideoAgent: Personalized Synthesis of Scientific Videos
Liang, Xiao, Li, Bangxin, Chen, Zixuan, Zheng, Hanyue, Ma, Zhi, Wang, Di, Tian, Cong, Wang, Quan
ABSTRACT Automating the generation of scientific videos is a crucial yet challenging task for effective knowledge dissemination. However, existing works on document automation primarily focus on static media such as posters and slides, lacking mechanisms for personalized dynamic orchestration and multimodal content synchronization. To address these challenges, we introduce VideoAgent, a novel multi-agent framework that synthesizes personalized scientific videos through a conversational interface. VideoAgent parses a source paper into a fine-grained asset library and, guided by user requirements, orchestrates a narrative flow that synthesizes both static slides and dynamic animations to explain complex concepts. To enable rigorous evaluation, we also propose SciVidEval, the first comprehensive suite for this task, which combines automated metrics for mul-timodal content quality and synchronization with a Video-Quiz-based human evaluation to measure knowledge transfer. Extensive experiments demonstrate that our method significantly outperforms existing commercial scientific video generation services and approaches human-level quality in scientific communication.
Tractable Asymmetric Verification for Large Language Models via Deterministic Replicability
Chong, Zan-Kai, Ohsaki, Hiroyuki, Ng, Bryan
This introduces a fundamental challenge in establishing computational trust, specifically how one agent can verify that another's output was genuinely produced by a claimed LLM, and not falsified or generated by a cheaper or inferior model. T o address this challenge, this paper proposes a verification framework that achieves tractable asymmetric effort, where the cost to verify a computation is substantially lower than the cost to perform it. Our approach is built upon the principle of deterministic replicability, a property inherent to autoregressive models that strictly necessitates a computationally homogeneous environment where all agents operate on identical hardware and software stacks. Within this defined context, our framework enables multiple validators to probabilistically audit small, random segments of an LLM's output and it distributes the verification workload effectively. The simulations demonstrated that targeted verification can be over 12 times faster than full regeneration, with tunable parameters to adjust the detection probability. By establishing a tractable mechanism for auditable LLM systems, our work offers a foundational layer for responsible AI and serves as a cornerstone for future research into the more complex, heterogeneous multi-agent systems.