Agents
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact
Qureshi, Rizwan, Sapkota, Ranjan, Shah, Abbas, Muneer, Amgad, Zafar, Anas, Vayani, Ashmal, Shoman, Maged, Eldaly, Abdelrahman B. M., Zhang, Kai, Sadak, Ferhat, Raza, Shaina, Fan, Xinqi, Shwartz-Ziv, Ravid, Yan, Hong, Jain, Vinjia, Chadha, Aman, Karkee, Manoj, Wu, Jia, Mirjalili, Seyedali
Can machines truly think, reason and act in domains like humans? This enduring question continues to shape the pursuit of Artificial General Intelligence (AGI). Despite the growing capabilities of models such as GPT-4.5, DeepSeek, Claude 3.5 Sonnet, Phi-4, and Grok 3, which exhibit multimodal fluency and partial reasoning, these systems remain fundamentally limited by their reliance on token-level prediction and lack of grounded agency. This paper offers a cross-disciplinary synthesis of AGI development, spanning artificial intelligence, cognitive neuroscience, psychology, generative models, and agent-based systems. We analyze the architectural and cognitive foundations of general intelligence, highlighting the role of modular reasoning, persistent memory, and multi-agent coordination. In particular, we emphasize the rise of Agentic RAG frameworks that combine retrieval, planning, and dynamic tool use to enable more adaptive behavior. We discuss generalization strategies, including information compression, test-time adaptation, and training-free methods, as critical pathways toward flexible, domain-agnostic intelligence. Vision-Language Models (VLMs) are reexamined not just as perception modules but as evolving interfaces for embodied understanding and collaborative task completion. We also argue that true intelligence arises not from scale alone but from the integration of memory and reasoning: an orchestration of modular, interactive, and self-improving components where compression enables adaptive behavior. Drawing on advances in neurosymbolic systems, reinforcement learning, and cognitive scaffolding, we explore how recent architectures begin to bridge the gap between statistical learning and goal-directed cognition. Finally, we identify key scientific, technical, and ethical challenges on the path to AGI.
Finding Common Ground: Using Large Language Models to Detect Agreement in Multi-Agent Decision Conferences
Heller, Selina, Ibrahim, Mohamed, Selby, David Antony, Vollmer, Sebastian
Decision conferences are structured, collaborative meetings that bring together experts from various fields to address complex issues and reach a consensus on recommendations for future actions or policies. These conferences often rely on facilitated discussions to ensure productive dialogue and collective agreement. Recently, Large Language Models (LLMs) have shown significant promise in simulating real-world scenarios, particularly through collaborative multi-agent systems that mimic group interactions. In this work, we present a novel LLM-based multi-agent system designed to simulate decision conferences, specifically focusing on detecting agreement among the participant agents. To achieve this, we evaluate six distinct LLMs on two tasks: stance detection, which identifies the position an agent takes on a given issue, and stance polarity detection, which identifies the sentiment as positive, negative, or neutral. These models are further assessed within the multi-agent system to determine their effectiveness in complex simulations. Our results indicate that LLMs can reliably detect agreement even in dynamic and nuanced debates. Incorporating an agreement-detection agent within the system can also improve the efficiency of group debates and enhance the overall quality and coherence of deliberations, making them comparable to real-world decision conferences regarding outcome and decision-making. These findings demonstrate the potential for LLM-based multi-agent systems to simulate group decision-making processes. They also highlight that such systems could be instrumental in supporting decision-making with expert elicitation workshops across various domains.
Discovering Algorithms with Computational Language Processing
Bourdais, Theo, Gnanasekaran, Abeynaya, Owhadi, Houman, Sahai, Tuhin
Algorithms are the engine for reproducible problem-solving. We present a framework automating algorithm discovery by conceptualizing them as sequences of operations, represented as tokens. These computational tokens are chained using a grammar, enabling the formation of increasingly sophisticated procedures. Our ensemble Monte Carlo tree search (MCTS) guided by reinforcement learning (RL) explores token chaining and drives the creation of new tokens. This methodology rediscovers, improves, and generates new algorithms that substantially outperform existing methods for strongly NP-hard combinatorial optimization problems and foundational quantum computing approaches such as Grover's and Quantum Approximate Optimization Algorithm. Operating at the computational rather than code-generation level, our framework produces algorithms that can be tailored specifically to problem instances, not merely classes.
Experimental Setup and Software Pipeline to Evaluate Optimization based Autonomous Multi-Robot Search Algorithms
Bhatt, Aditya, Corra, Mary Katherine, Merlo, Franklin, KrisshnaKumar, Prajit, Chowdhury, Souma
Signal source localization has been a problem of interest in the multi-robot systems domain given its applications in search & rescue and hazard localization in various industrial and outdoor settings. A variety of multi-robot search algorithms exist that usually formulate and solve the associated autonomous motion planning problem as a heuristic model-free or belief model-based optimization process. Most of these algorithms however remains tested only in simulation, thereby losing the opportunity to generate knowledge about how such algorithms would compare/contrast in a real physical setting in terms of search performance and real-time computing performance. To address this gap, this paper presents a new lab-scale physical setup and associated open-source software pipeline to evaluate and benchmark multi-robot search algorithms. The presented physical setup innovatively uses an acoustic source (that is safe and inexpensive) and small ground robots (e-pucks) operating in a standard motion-capture environment. This setup can be easily recreated and used by most robotics researchers. The acoustic source also presents interesting uncertainty in terms of its noise-to-signal ratio, which is useful to assess sim-to-real gaps. The overall software pipeline is designed to readily interface with any multi-robot search algorithm with minimal effort and is executable in parallel asynchronous form. This pipeline includes a framework for distributed implementation of multi-robot or swarm search algorithms, integrated with a ROS (Robotics Operating System)-based software stack for motion capture supported localization. The utility of this novel setup is demonstrated by using it to evaluate two state-of-the-art multi-robot search algorithms, based on swarm optimization and batch-Bayesian Optimization (called Bayes-Swarm), as well as a random walk baseline.
Upgrade or Switch: Do We Need a Next-Gen Trusted Architecture for the Internet of AI Agents?
Raskar, Ramesh, Chari, Pradyumna, Grogan, Jared James, Lambe, Mahesh, Lincourt, Robert, Bala, Raghu, Joshi, Aditi, Singh, Abhishek, Chopra, Ayush, Ranjan, Rajesh, Gupta, Shailja, Stripelis, Dimitris, Gorskikh, Maria, Wang, Sichao
The emerging Internet of AI Agents challenges existing web infrastructure designed for human-scale, reactive interactions. Unlike traditional web resources, autonomous AI agents initiate actions, maintain persistent state, spawn sub-agents, and negotiate directly with peers: demanding millisecond-level discovery, instant credential revocation, and cryptographic behavioral proofs that exceed current DNS/PKI capabilities. This paper analyzes whether to upgrade existing infrastructure or implement purpose-built index architectures for autonomous agents. We identify critical failure points: DNS propagation (24-48 hours vs. required milliseconds), certificate revocation unable to scale to trillions of entities, and IPv4/IPv6 addressing inadequate for agent-scale routing. We evaluate three approaches: (1) Upgrade paths, (2) Switch options, (3) Hybrid index/registries. Drawing parallels to dialup-to-broadband transitions, we find that agent requirements constitute qualitative, and not incremental, changes. While upgrades offer compatibility and faster deployment, clean-slate solutions provide better performance but require longer for adoption. Our analysis suggests hybrid approaches will emerge, with centralized indexes for critical agents and federated meshes for specialized use cases.
Toward Data Systems That Are Business Semantic Centric and AI Agents Assisted
Contemporary businesses operate in dynamic environments requiring rapid adaptation to achieve goals and maintain competitiveness. Existing data platforms often fall short by emphasizing tools over alignment with business needs, resulting in inefficiencies and delays. To address this gap, I propose the Business Semantics Centric, AI Agents Assisted Data System (BSDS), a holistic system that integrates architecture, workflows, and team organization to ensure data systems are tailored to business priorities rather than dictated by technical constraints. BSDS redefines data systems as dynamic enablers of business success, transforming them from passive tools into active drivers of organizational growth. BSDS has a modular architecture that comprises curated data linked to business entities, a knowledge base for context-aware AI agents, and efficient data pipelines. AI agents play a pivotal role in assisting with data access and system management, reducing human effort, and improving scalability. Complementing this architecture, BSDS incorporates workflows optimized for both exploratory data analysis and production requirements, balancing speed of delivery with quality assurance. A key innovation of BSDS is its incorporation of the human factor. By aligning data team expertise with business semantics, BSDS bridges the gap between technical capabilities and business needs. Validated through real-world implementation, BSDS accelerates time-to-market for data-driven initiatives, enhances cross-functional collaboration, and provides a scalable blueprint for businesses of all sizes. Future research can build on BSDS to explore optimization strategies using complex systems and adaptive network theories, as well as developing autonomous data systems leveraging AI agents.
Introspection of Thought Helps AI Agents
AI Agents rely on Large Language Models (LLMs) and Multimodal-LLMs (MLLMs) to perform interpretation and inference in text and image tasks without post-training, where LLMs and MLLMs play the most critical role and determine the initial ability and limitations of AI Agents. Usually, AI Agents utilize sophisticated prompt engineering and external reasoning framework to obtain a promising interaction with LLMs, e.g., Chain-of-Thought, Iteration of Thought and Image-of-Thought. However, they are still constrained by the inherent limitations of LLM in understanding natural language, and the iterative reasoning process will generate a large amount of inference cost. To this end, we propose a novel AI Agent Reasoning Framework with Introspection of Thought (INoT) by designing a new LLM-Read code in prompt. It enables LLM to execute programmatic dialogue reasoning processes following the code in prompt. Therefore, self-denial and reflection occur within LLM instead of outside LLM, which can reduce token cost effectively. Through our experiments on six benchmarks for three different tasks, the effectiveness of INoT is verified, with an average improvement of 7.95\% in performance, exceeding the baselines. Furthermore, the token cost of INoT is lower on average than the best performing method at baseline by 58.3\%. In addition, we demonstrate the versatility of INoT in image interpretation and inference through verification experiments.
Safe Deep Reinforcement Learning for Resource Allocation with Peak Age of Information Violation Guarantees
Reyhan, Berire Gunes, Coleri, Sinem
In Wireless Networked Control Systems (WNCSs), control and communication systems must be co-designed due to their strong interdependence. This paper presents a novel optimization theory-based safe deep reinforcement learning (DRL) framework for ultra-reliable WNCSs, ensuring constraint satisfaction while optimizing performance, for the first time in the literature. The approach minimizes power consumption under key constraints, including Peak Age of Information (PAoI) violation probability, transmit power, and schedulability in the finite blocklength regime. PAoI violation probability is uniquely derived by combining stochastic maximum allowable transfer interval (MATI) and maximum allowable packet delay (MAD) constraints in a multi-sensor network. The framework consists of two stages: optimization theory and safe DRL. The first stage derives optimality conditions to establish mathematical relationships among variables, simplifying and decomposing the problem. The second stage employs a safe DRL model where a teacher-student framework guides the DRL agent (student). The control mechanism (teacher) evaluates compliance with system constraints and suggests the nearest feasible action when needed. Extensive simulations show that the proposed framework outperforms rule-based and other optimization theory based DRL benchmarks, achieving faster convergence, higher rewards, and greater stability.
DatasetAgent: A Novel Multi-Agent System for Auto-Constructing Datasets from Real-World Images
Sun, Haoran, Bian, Haoyu, Zeng, Shaoning, Rao, Yunbo, Xu, Xu, Mei, Lin, Gou, Jianping
Common knowledge indicates that the process of constructing image datasets usually depends on the time-intensive and inefficient method of manual collection and annotation. Large models offer a solution via data generation. Nonetheless, real-world data are obviously more valuable comparing to artificially intelligence generated data, particularly in constructing image datasets. For this reason, we propose a novel method for auto-constructing datasets from real-world images by a multiagent collaborative system, named as DatasetAgent. By coordinating four different agents equipped with Multi-modal Large Language Models (MLLMs), as well as a tool package for image optimization, DatasetAgent is able to construct high-quality image datasets according to user-specified requirements. In particular, two types of experiments are conducted, including expanding existing datasets and creating new ones from scratch, on a variety of open-source datasets. In both cases, multiple image datasets constructed by DatasetAgent are used to train various vision models for image classification, object detection, and image segmentation.
Adaptive Framework for Ambient Intelligence in Rehabilitation Assistance
Baranyi, Gábor, Csibi, Zsolt, Fenech, Kristian, Fóthi, Áron, Gaál, Zsófia, Skaf, Joul, Lőrincz, András
This paper introduces the Ambient Intelligence Rehabilitation Support (AIRS) framework, an advanced artificial intelligence-based solution tailored for home rehabilitation environments. AIRS integrates cutting-edge technologies, including Real-Time 3D Reconstruction (RT-3DR), intelligent navigation, and large Vision-Language Models (VLMs), to create a comprehensive system for machine-guided physical rehabilitation. The general AIRS framework is demonstrated in rehabilitation scenarios following total knee replacement (TKR), utilizing a database of 263 video recordings for evaluation. A smartphone is employed within AIRS to perform RT-3DR of living spaces and has a body-matched avatar to provide visual feedback about the excercise. This avatar is necessary in (a) optimizing exercise configurations, including camera placement, patient positioning, and initial poses, and (b) addressing privacy concerns and promoting compliance with the AI Act. The system guides users through the recording process to ensure the collection of properly recorded videos. AIRS employs two feedback mechanisms: (i) visual 3D feedback, enabling direct comparisons between prerecorded clinical exercises and patient home recordings and (ii) VLM-generated feedback, providing detailed explanations and corrections for exercise errors. The framework also supports people with visual and hearing impairments. It also features a modular design that can be adapted to broader rehabilitation contexts. AIRS software components are available for further use and customization.