Agents
A Scenario-Driven Cognitive Approach to Next-Generation AI Memory
Cai, Linyue, Cheng, Yuyang, Shao, Xiaoding, Wang, Huiming, Zhao, Yong, Zhang, Wei, Li, Kang
As artificial intelligence advances toward artificial general intelligence (AGI), the need for robust and human-like memory systems has become increasingly evident. Current memory architectures often suffer from limited adaptability, insufficient multimodal integration, and an inability to support continuous learning. To address these limitations, we propose a scenario-driven methodology that extracts essential functional requirements from representative cognitive scenarios, leading to a unified set of design principles for next-generation AI memory systems. Based on this approach, we introduce the \textbf{COgnitive Layered Memory Architecture (COLMA)}, a novel framework that integrates cognitive scenarios, memory processes, and storage mechanisms into a cohesive design. COLMA provides a structured foundation for developing AI systems capable of lifelong learning and human-like reasoning, thereby contributing to the pragmatic development of AGI.
Agentic AI for Financial Crime Compliance
Axelsen, Henrik, Licht, Valdemar, Damsgaard, Jan
The cost and complexity of financial crime compliance (FCC) continue to rise, often without measurable improvements in effectiveness. While AI offers potential, most solutions remain opaque and poorly aligned with regulatory expectations. This paper presents the design and deployment of an agentic AI system for FCC in digitally native financial platforms. Developed through an Action Design Research (ADR) process with a fintech firm and regulatory stakeholders, the system automates onboarding, monitoring, investigation, and reporting, emphasizing explainability, traceability, and compliance-by-design. Using artifact-centric modeling, it assigns clearly bounded roles to autonomous agents and enables task-specific model routing and audit logging. The contribution includes a reference architecture, a real-world prototype, and insights into how Agentic AI can reconfigure FCC workflows under regulatory constraints. Our findings extend IS literature on AI-enabled compliance by demonstrating how automation, when embedded within accountable governance structures, can support transparency and institutional trust in high-stakes, regulated environments.
Practical Handling of Dynamic Environments in Decentralised Multi-Robot Patrol
Ward, James C., Richards, Arthur, Hunt, Edmund R.
Persistent monitoring using robot teams is of interest in fields such as security, environmental monitoring, and disaster recovery. Performing such monitoring in a fully on-line decentralised fashion has significant potential advantages for robustness, adaptability, and scalability of monitoring solutions, including, in principle, the capacity to effectively adapt in real-time to a changing environment. We examine this through the lens of multi-robot patrol, in which teams of patrol robots must persistently minimise time between visits to points of interest, within environments where traversability of routes is highly dynamic. These dynamics must be observed by patrol agents and accounted for in a fully decentralised on-line manner. In this work, we present a new method of monitoring and adjusting for environment dynamics in a decentralised multi-robot patrol team. We demonstrate that our method significantly outperforms realistic baselines in highly dynamic scenarios, and also investigate dynamic scenarios in which explicitly accounting for environment dynamics may be unnecessary or impractical.
xOffense: An AI-driven autonomous penetration testing framework with offensive knowledge-enhanced LLMs and multi agent systems
Luong, Phung Duc, Bao, Le Tran Gia, Tam, Nguyen Vu Khai, Khoa, Dong Huu Nguyen, Quyen, Nguyen Huu, Pham, Van-Hau, Duy, Phan The
This work introduces xOffense, an AI-driven, multi-agent penetration testing framework that shifts the process from labor-intensive, expert-driven manual efforts to fully automated, machine-executable workflows capable of scaling seamlessly with computational infrastructure. At its core, xOffense leverages a fine-tuned, mid-scale open-source LLM (Qwen3-32B) to drive reasoning and decision-making in penetration testing. The framework assigns specialized agents to reconnaissance, vulnerability scanning, and exploitation, with an orchestration layer ensuring seamless coordination across phases. Fine-tuning on Chain-of-Thought penetration testing data further enables the model to generate precise tool commands and perform consistent multi-step reasoning. We evaluate xOffense on two rigorous benchmarks: AutoPenBench and AI-Pentest-Benchmark. The results demonstrate that xOffense consistently outperforms contemporary methods, achieving a sub-task completion rate of 79.17%, decisively surpassing leading systems such as VulnBot and PentestGPT. These findings highlight the potential of domain-adapted mid-scale LLMs, when embedded within structured multi-agent orchestration, to deliver superior, cost-efficient, and reproducible solutions for autonomous penetration testing.
A Visualized Framework for Event Cooperation with Generative Agents
Tian, Yuyang, Mao, Shunqiang, Gao, Wenchang, Qiu, Lanlan, He, Tianxing
Large Language Models (LLMs) have revolutionized the simulation of agent societies, enabling autonomous planning, memory formation, and social interactions. However, existing frameworks often overlook systematic evaluations for event organization and lack visualized integration with physically grounded environments, limiting agents' ability to navigate spaces and interact with items realistically. We develop Mini-AgentPro, a visualization platform featuring an intuitive map editor for customizing environments and a simulation player with smooth animations. Based on this tool, we introduce a comprehensive test set comprising eight diverse event scenarios with basic and hard variants to assess agents' ability. Evaluations using GPT -4o demonstrate strong performance in basic settings but highlight coordination challenges in hard variants.
HLSMAC: A New StarCraft Multi-Agent Challenge for High-Level Strategic Decision-Making
Hong, Xingxing, Wang, Yungong, Jin, Dexin, Yuan, Ye, Huang, Ximing, Wu, Zijian, Li, Wenxin
Benchmarks are crucial for assessing multi-agent reinforcement learning (MARL) algorithms. While StarCraft II-related environments have driven significant advances in MARL, existing benchmarks like SMAC focus primarily on mi-cromanagement, limiting comprehensive evaluation of high-level strategic intelligence. To address this, we introduce HLSMAC, a new cooperative MARL benchmark with 12 carefully designed StarCraft II scenarios based on classical stratagems from the Thirty-Six Stratagems. Each scenario corresponds to a specific stratagem and is designed to challenge agents with diverse strategic elements, including tactical maneuvering, timing coordination, and deception, thereby opening up avenues for evaluating high-level strategic decision-making capabilities. We also propose novel metrics across multiple dimensions beyond conventional win rate, such as ability utilization and advancement efficiency, to assess agents' overall performance within the HLSMAC environment. We integrate state-of-the-art MARL algorithms and LLM-based agents with our benchmark and conduct comprehensive experiments. The results demonstrate that HLSMAC serves as a robust testbed for advancing multi-agent strategic decision-making.
Spotting the Unfriendly Robot -- Towards better Metrics for Interactions
Wenzel, Raphael, Probst, Malte
Abstract-- Establishing standardized metrics for Social Robot Navigation (SRN) algorithms for assessing the quality and social compliance of robot behavior around humans is essential for SRN research. Currently, commonly used evaluation metrics lack the ability to quantify how cooperative an agent behaves in interaction with humans. Concretely, in a simple frontal approach scenario, no metric specifically captures if both agents cooperate or if one agent stays on collision course and the other agent is forced to evade. T o address this limitation, we propose two new metrics, a conflict intensity metric and the responsibility metric. T ogether, these metrics are capable of evaluating the quality of human-robot interactions by showing how much a given algorithm has contributed to reducing a conflict and which agent actually took responsibility of the resolution. This work aims to contribute to the development of a comprehensive and standardized evaluation methodology for SRN, ultimately enhancing the safety, efficiency, and social acceptance of robots in human-centric environments.
Between proportionnality and envy-freeness: k-proportionality
This article deals with the cake cutting problem. In this setting, there exists two notions of fair division: proportional division (when there are n players, each player thinks to get at least 1/n of the cake) and envy-free division (each player wants to keep his or her share because he or she does not envy the portion given to another player). Some results are valid for proportional division but not for envy-free division. Here, we introduce and study a scale between the proportional division and the envy-free division. The goal is to understand where is the gap between statements about proportional division and envy-free division. This scale comes from the notion introduced in this article: k-proportionality. When k = n this notion corresponds to the proportional division and when k = 2 it corresponds to envy-free division. With k-proportionality we can understand where some difficulties in fair division lie. First, we show that there are situations in which there is no k-proportional and equitable division of a pie with connected pieces when k $\le$ n -1. This result was known only for envy-free division, ie k = 2. Next, we prove that there are situations in which there is no Pareto-optimal k-proportional division of a cake with connected pieces when k $\le$ n -1. This result was known only for k = 2. These theorems say that we can get an impossibility result even if we do not consider an envy-free division but a weaker notion. Finally, k-proportionality allows to give a generalization with a uniform statement of theorems about strong envy-free and strong proportional divisions.
Responsibility and Engagement -- Evaluating Interactions in Social Robot Navigation
Probst, Malte, Wenzel, Raphael, Dasi, Monica
Abstract-- In Social Robot Navigation (SRN), the availability of meaningful metrics is crucial for evaluating trajectories from human-robot interactions. In the SRN context, such interactions often relate to resolving conflicts between two or more agents. Correspondingly, the shares to which agents contribute to the resolution of such conflicts are important. This paper builds on recent work, which proposed a Responsibility metric capturing such shares. We extend this framework in two directions: First, we model the conflict buildup phase by introducing a time normalization. Second, we propose the related Engagement metric, which captures how the agents' actions intensify a conflict. In a comprehensive series of simulated scenarios with dyadic, group and crowd interactions, we show that the metrics carry meaningful information about the cooperative resolution of conflicts in interactions. They can be used to assess behavior quality and foresightedness. We extensively discuss applicability, design choices and limitations of the proposed metrics.
A Novel Skill Modeling Approach: Integrating Vergnaud's Scheme with Cognitive Architectures
Lénat, Antoine, Cheminat, Olivier, Chablat, Damien, Charron, Camilo
Human-machine interaction is increasingly important in industry, and this trend will only intensify with the rise of Industry 5.0. Human operators have skills that need to be adapted when using machines to achieve the best results. It is crucial to highlight the operator's skills and understand how they use and adapt them [18]. A rigorous description of these skills is necessary to compare performance with and without robot assistance. Predicate logic, used by Vergnaud within Piaget's scheme concept, offers a promising approach. However, this theory doesn't account for cognitive system constraints, such as the timing of actions, the limitation of cognitive resources, the parallelization of tasks, or the activation of automatic gestures contrary to optimal knowledge. Integrating these constraints is essential for representing agent skills understanding skill transfer between biological and mechanical structures. Cognitive architectures models [2] address these needs by describing cognitive structure and can be combined with the scheme for mutual benefit. Welding provides a relevant case study, as it highlights the challenges faced by operators, even highly skilled ones. Welding's complexity stems from the need for constant skill adaptation to variable parameters like part position and process. This adaptation is crucial, as weld quality, a key factor, is only assessed afterward via destructive testing. Thus, the welder is confronted with a complex perception-decision-action cycle, where the evaluation of the impact of his actions is delayed and where errors are definitive. This dynamic underscores the importance of understanding and modeling the skills of operators.