Agents
Let's Get You Hired: A Job Seeker's Perspective on Multi-Agent Recruitment Systems for Explaining Hiring Decisions
Bhattacharya, Aditya, Verbert, Katrien
During job recruitment, traditional applicant selection methods often lack transparency. Candidates are rarely given sufficient justifications for recruiting decisions, whether they are made manually by human recruiters or through the use of black-box Applicant Tracking Systems (ATS). To address this problem, our work introduces a multi-agent AI system that uses Large Language Models (LLMs) to guide job seekers during the recruitment process. Using an iterative user-centric design approach, we first conducted a two-phased exploratory study with four active job seekers to inform the design and development of the system. Subsequently, we conducted an in-depth, qualitative user study with 20 active job seekers through individual one-to-one interviews to evaluate the developed prototype. The results of our evaluation demonstrate that participants perceived our multi-agent recruitment system as significantly more actionable, trustworthy, and fair compared to traditional methods. Our study further helped us uncover in-depth insights into factors contributing to these perceived user experiences. Drawing from these insights, we offer broader design implications for building user-aligned, multi-agent explainable AI systems across diverse domains.
MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability
Wu, Weiqi, Guan, Xin, Huang, Shen, Jiang, Yong, Xie, Pengjun, Huang, Fei, Cao, Jiuxin, Zhao, Hai, Zhou, Jingren
Retrieval-Augmented Language Models (RALMs) represent a classic paradigm where models enhance generative capabilities using external knowledge retrieved via a specialized module. Recent advancements in Agent techniques enable Large Language Models (LLMs) to autonomously utilize tools for retrieval, planning, and reasoning. While existing training-based methods show promise, their agentic abilities are limited by inherent characteristics of the task-specific data used during training. To further enhance the universal search capability of agents, we propose a novel pre-training framework, MaskSearch. In the pre-training stage, we introduce the Retrieval Augmented Mask Prediction (RAMP) task, where the model learns to leverage search tools to fill masked spans on a large number of pre-training data, thus acquiring universal retrieval and reasoning capabilities for LLMs. After that, the model is trained on downstream tasks to achieve further improvement. We apply both Supervised Fine-tuning (SFT) and Reinforcement Learning (RL) for training. For SFT, we combine agent-based and distillation-based methods to generate training data, starting with a multi-agent system consisting of a planner, rewriter, observer, and followed by a self-evolving teacher model. While for RL, we employ DAPO as the training framework and adopt a hybrid reward system consisting of answer rewards and format rewards. Additionally, we introduce a curriculum learning approach that allows the model to learn progressively from easier to more challenging instances based on the number of masked spans. We evaluate the effectiveness of our framework in the scenario of open-domain multi-hop question answering. Through extensive experiments, we demonstrate that MaskSearch significantly enhances the performance of LLM-based search agents on both in-domain and out-of-domain downstream tasks.
TransBench: Breaking Barriers for Transferable Graphical User Interface Agents in Dynamic Digital Environments
Lu, Yuheng, Yu, Qian, Wang, Hongru, Liu, Zeming, Su, Wei, Liu, Yanping, Guo, Yuhang, Liang, Maocheng, Wang, Yunhong, Wang, Haifeng
Graphical User Interface (GUI) agents, which autonomously operate on digital interfaces through natural language instructions, hold transformative potential for accessibility, automation, and user experience. A critical aspect of their functionality is grounding - the ability to map linguistic intents to visual and structural interface elements. However, existing GUI agents often struggle to adapt to the dynamic and interconnected nature of real-world digital environments, where tasks frequently span multiple platforms and applications while also being impacted by version updates. To address this, we introduce TransBench, the first benchmark designed to systematically evaluate and enhance the transferability of GUI agents across three key dimensions: cross-version transferability (adapting to version updates), cross-platform transferability (generalizing across platforms like iOS, Android, and Web), and cross-application transferability (handling tasks spanning functionally distinct apps). TransBench includes 15 app categories with diverse functionalities, capturing essential pages across versions and platforms to enable robust evaluation. Our experiments demonstrate significant improvements in grounding accuracy, showcasing the practical utility of GUI agents in dynamic, real-world environments. Our code and data will be publicly available at GitHub.
PeerGuard: Defending Multi-Agent Systems Against Backdoor Attacks Through Mutual Reasoning
Multi-agent systems leverage advanced AI models as autonomous agents that interact, cooperate, or compete to complete complex tasks across applications such as robotics and traffic management. Despite their growing importance, safety in multi-agent systems remains largely underexplored, with most research focusing on single AI models rather than interacting agents. This work investigates backdoor vulnerabilities in multi-agent systems and proposes a defense mechanism based on agent interactions. By leveraging reasoning abilities, each agent evaluates responses from others to detect illogical reasoning processes, which indicate poisoned agents. Experiments on LLM-based multi-agent systems, including ChatGPT series and Llama 3, demonstrate the effectiveness of the proposed method, achieving high accuracy in identifying poisoned agents while minimizing false positives on clean agents. We believe this work provides insights into multi-agent system safety and contributes to the development of robust, trustworthy AI interactions.
Sky-Drive: A Distributed Multi-Agent Simulation Platform for Human-AI Collaborative and Socially-Aware Future Transportation
Huang, Zilin, Sheng, Zihao, Wan, Zhengyang, Qu, Yansong, Luo, Yuhao, Wang, Boyue, Li, Pei, Chen, Yen-Jung, Chen, Jiancong, Long, Keke, Meng, Jiayi, Leng, Yue, Chen, Sikai
--Recent advances in autonomous system simulation platforms have significantly enhanced the safe and scalable testing of driving policies. However, existing simulators do not yet fully meet the needs of future transportation research--particularly in enabling effective human-AI collaboration and modeling socially-aware driving agents. This paper introduces Sky-Drive, a novel distributed multi-agent simulation platform that addresses these limitations through four key innovations: (a) a distributed architecture for synchronized simulation across multiple terminals; (b) a multi-modal human-in-the-loop framework integrating diverse sensors to collect rich behavioral data; (c) a human-AI collaboration mechanism supporting continuous and adaptive knowledge exchange; and (d) a digital twin framework for constructing high-fidelity virtual replicas of real-world transportation environments. Sky-Drive supports diverse applications such as autonomous vehicle-human road users interaction modeling, human-in-the-loop training, socially-aware reinforcement learning, personalized driving development, and customized scenario generation. Future extensions will incorporate foundation models for context-aware decision support and hardware-in-the-loop testing for real-world validation. By bridging scenario generation, data collection, algorithm training, and hardware integration, Sky-Drive has the potential to become a foundational platform for the next generation of human-centered and socially-aware autonomous transportation systems research. UTONOMOUS systems and related technologies have made significant strides in recent years, demonstrating increasing maturity in perception, decision-making, and control capabilities [1]-[4]. The corresponding author is Sikai Chen (E-mail: sikai.chen@wisc.edu). These authors contributed equally to this work. Zilin Huang, Zihao Sheng, Zhengyang Wan, Y uhao Luo, Boyue Wang, Pei Li, Keke Long, and Sikai Chen are with the Department of Civil and Environmental Engineering, University of Wisconsin-Madison, Madison, WI 53706, USA (E-mails: {zilin.huang, Y ansong Qu, and Jian-cong Chen are with the Lyles School of Civil and Construction Engineering, Purdue University, West Lafayette, IN 47907, USA (E-mail: { qu120, chen5281 }@purdue.edu). Y en-Jung Chen is with the Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA (E-mail: chen4126@purdue.edu). Jiayi Meng is with the Department of Computer Science and Engineering, The University of Texas at Arlington, Arlington, TX 76019, USA (E-mail: jiayi.meng@uta.edu).
Building Trustworthy Multimodal AI: A Review of Fairness, Transparency, and Ethics in Vision-Language Tasks
Saleh, Mohammad, Tabatabaei, Azadeh
Objective: This review explores the trustworthiness of multimodal artificial intelligence (AI) systems, specifically focusing on vision-language tasks. It addresses critical challenges related to fairness, transparency, and ethical implications in these systems, providing a comparative analysis of key tasks such as Visual Question Answering (VQA), image captioning, and visual dialogue. Background: Multimodal models, particularly vision-language models, enhance artificial intelligence (AI) capabilities by integrating visual and textual data, mimicking human learning processes. Despite significant advancements, the trustworthiness of these models remains a crucial concern, particularly as AI systems increasingly confront issues regarding fairness, transparency, and ethics. Methods: This review examines research conducted from 2017 to 2024 focusing on forenamed core vision-language tasks. It employs a comparative approach to analyze these tasks through the lens of trustworthiness, underlining fairness, explainability, and ethics. This study synthesizes findings from recent literature to identify trends, challenges, and state-of-the-art solutions. Results: Several key findings were highlighted. Transparency: Explainability of vision language tasks is important for user trust. Techniques, such as attention maps and gradient-based methods, have successfully addressed this issue. Fairness: Bias mitigation in VQA and visual dialogue systems is essential for ensuring unbiased outcomes across diverse demographic groups. Ethical Implications: Addressing biases in multilingual models and ensuring ethical data handling is critical for the responsible deployment of vision-language systems. Conclusion: This study underscores the importance of integrating fairness, transparency, and ethical considerations in developing vision-language models within a unified framework.
Semantic Communication meets System 2 ML: How Abstraction, Compositionality and Emergent Languages Shape Intelligence
The trajectories of 6G and AI are set for a creative collision. However, current visions for 6G remain largely incremental evolutions of 5G, while progress in AI is hampered by brittle, data-hungry models that lack robust reasoning capabilities. This paper argues for a foundational paradigm shift, moving beyond the purely technical level of communication toward systems capable of semantic understanding and effective, goal-oriented interaction. We propose a unified research vision rooted in the principles of System-2 cognition, built upon three pillars: Abstraction, enabling agents to learn meaningful world models from raw sensorimotor data; Compositionality, providing the algebraic tools to combine learned concepts and subsystems; and Emergent Communication, allowing intelligent agents to create their own adaptive and grounded languages. By integrating these principles, we lay the groundwork for truly intelligent systems that can reason, adapt, and collaborate, unifying advances in wireless communications, machine learning, and robotics under a single coherent framework.
Scalable Constrained Policy Optimization for Safe Multi-agent Reinforcement Learning
A challenging problem in seeking to bring multi-agent reinforcement learning (MARL) techniques into real-world applications, such as autonomous driving and drone swarms, is how to control multiple agents safely and cooperatively to accomplish tasks. Most existing safe MARL methods learn the centralized value function by introducing a global state to guide safety cooperation. In this paper, we develop a novel scalable and theoretically-justified multi-agent constrained policy optimization method. This method utilizes the rigorous bounds of the trust region method and the bounds of the truncated advantage function to provide a new local policy optimization objective for each agent. Also, we prove that the safety constraints and the joint policy improvement can be met when each agent adopts a sequential update scheme to optimize a \kappa -hop policy.
Reflective Multi-Agent Collaboration based on Large Language Models
Benefiting from the powerful language expression and planning capabilities of Large Language Models (LLMs), LLM-based autonomous agents have achieved promising performance in various downstream tasks. Recently, based on the development of single-agent systems, researchers propose to construct LLM-based multi-agent systems to tackle more complicated tasks. In this paper, we propose a novel framework, named COPPER, to enhance the collaborative capabilities of LLM-based agents with the self-reflection mechanism. To improve the quality of reflections, we propose to fine-tune a shared reflector, which automatically tunes the prompts of actor models using our counterfactual PPO mechanism. On the one hand, we propose counterfactual rewards to assess the contribution of a single agent's reflection within the system, alleviating the credit assignment problem.
FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making
Large language models (LLMs) have demonstrated notable potential in conducting complex tasks and are increasingly utilized in various financial applications. However, high-quality sequential financial investment decision-making remains challenging. These tasks require multiple interactions with a volatile environment for every decision, demanding sufficient intelligence to maximize returns and manage risks. Although LLMs have been used to develop agent systems that surpass human teams and yield impressive investment returns, opportunities to enhance multi-source information synthesis and optimize decision-making outcomes through timely experience refinement remain unexplored. Here, we introduce FinCon, an LLM-based multi-agent framework tailored for diverse financial tasks.