Chen, Xu
Improving Retrospective Language Agents via Joint Policy Gradient Optimization
Feng, Xueyang, Lan, Bo, Dai, Quanyu, Wang, Lei, Tang, Jiakai, Chen, Xu, Dong, Zhenhua, Wen, Ji-Rong
In recent research advancements within the community, large language models (LLMs) have sparked great interest in creating autonomous agents. However, current prompt-based agents often heavily rely on large-scale LLMs. Meanwhile, although fine-tuning methods significantly enhance the capabilities of smaller LLMs, the fine-tuned agents often lack the potential for self-reflection and self-improvement. To address these challenges, we introduce a novel agent framework named RetroAct, which is a framework that jointly optimizes both task-planning and self-reflective evolution capabilities in language agents. Specifically, we develop a two-stage joint optimization process that integrates imitation learning and reinforcement learning, and design an off-policy joint policy gradient optimization algorithm with imitation learning regularization to enhance the data efficiency and training stability in agent tasks. RetroAct significantly improves the performance of open-source models, reduces dependency on closed-source LLMs, and enables fine-tuned agents to learn and evolve continuously. We conduct extensive experiments across various testing environments, demonstrating RetroAct has substantial improvements in task performance and decision-making processes.
Unveiling and Causalizing CoT: A Causal Pespective
Fu, Jiarun, Ding, Lizhong, Li, Hao, Li, Pengqi, Wei, Qiuning, Chen, Xu
Although Chain-of-Thought (CoT) has achieved remarkable success in enhancing the reasoning ability of large language models (LLMs), the mechanism of CoT remains a ``black box''. Even if the correct answers can frequently be obtained, existing CoTs struggle to make the reasoning understandable to human. In this paper, we unveil and causalize CoT from a causal perspective to ensure both correctness and understandability of all reasoning steps (to the best of our knowledge, the first such). We model causality of CoT via structural causal models (SCM) to unveil the reasoning mechanism of CoT. To measure the causality of CoT, we define the CoT Average Causal Effect (CACE) to test the causal relations between steps. For those steps without causality (wrong or unintelligible steps), we design a role-playing causal query algorithm to causalize these steps, resulting a causalized CoT with all steps correct and understandable. Experimental results on both open-source and closed-source LLMs demonstrate that the causal errors commonly in steps are effectively corrected and the reasoning ability of LLMs is significantly improved.
Testing for Causal Fairness
Fu, Jiarun, Ding, LiZhong, Li, Pengqi, Wei, Qiuning, Cheng, Yurong, Chen, Xu
Causality is widely used in fairness analysis to prevent discrimination on sensitive attributes, such as genders in career recruitment and races in crime prediction. However, the current data-based Potential Outcomes Framework (POF) often leads to untrustworthy fairness analysis results when handling high-dimensional data. To address this, we introduce a distribution-based POF that transform fairness analysis into Distributional Closeness Testing (DCT) by intervening on sensitive attributes. We define counterfactual closeness fairness as the null hypothesis of DCT, where a sensitive attribute is considered fair if its factual and counterfactual potential outcome distributions are sufficiently close. We introduce the Norm-Adaptive Maximum Mean Discrepancy Treatment Effect (N-TE) as a statistic for measuring distributional closeness and apply DCT using the empirical estimator of NTE, referred to Counterfactual Fairness-CLOseness Testing ($\textrm{CF-CLOT}$). To ensure the trustworthiness of testing results, we establish the testing consistency of N-TE through rigorous theoretical analysis. $\textrm{CF-CLOT}$ demonstrates sensitivity in fairness analysis through the flexibility of the closeness parameter $\epsilon$. Unfair sensitive attributes have been successfully tested by $\textrm{CF-CLOT}$ in extensive experiments across various real-world scenarios, which validate the consistency of the testing.
Investigating and Extending Homans' Social Exchange Theory with Large Language Model based Agents
Wang, Lei, Zhang, Zheqing, Chen, Xu
Homans' Social Exchange Theory (SET) is widely recognized as a basic framework for understanding the formation and emergence of human civilizations and social structures. In social science, this theory is typically studied based on simple simulation experiments or real-world human studies, both of which either lack realism or are too expensive to control. In artificial intelligence, recent advances in large language models (LLMs) have shown promising capabilities in simulating human behaviors. Inspired by these insights, we adopt an interdisciplinary research perspective and propose using LLM-based agents to study Homans' SET. Specifically, we construct a virtual society composed of three LLM agents and have them engage in a social exchange game to observe their behaviors. Through extensive experiments, we found that Homans' SET is well validated in our agent society, demonstrating the consistency between the agent and human behaviors. Building on this foundation, we intentionally alter the settings of the agent society to extend the traditional Homans' SET, making it more comprehensive and detailed. To the best of our knowledge, this paper marks the first step in studying Homans' SET with LLM-based agents. More importantly, it introduces a novel and feasible research paradigm that bridges the fields of social science and computer science through LLM-based agents. Code is available at https://github.com/Paitesanshi/SET.
AI-in-the-Loop Sensing and Communication Joint Design for Edge Intelligence
Cai, Zhijie, Cao, Xiaowen, Chen, Xu, Cui, Yuanhao, Zhu, Guangxu, Huang, Kaibin, Cui, Shuguang
Recent breakthroughs in artificial intelligence (AI), wireless communications, and sensing technologies have accelerated the evolution of edge intelligence. However, conventional systems still grapple with issues such as low communication efficiency, redundant data acquisition, and poor model generalization. To overcome these challenges, we propose an innovative framework that enhances edge intelligence through AI-in-the-loop joint sensing and communication (JSAC). This framework features an AI-driven closed-loop control architecture that jointly optimizes system resources, thereby delivering superior system-level performance. A key contribution of our work is establishing an explicit relationship between validation loss and the system's tunable parameters. This insight enables dynamic reduction of the generalization error through AI-driven closed-loop control. Specifically, for sensing control, we introduce an adaptive data collection strategy based on gradient importance sampling, allowing edge devices to autonomously decide when to terminate data acquisition and how to allocate sample weights based on real-time model feedback. For communication control, drawing inspiration from stochastic gradient Langevin dynamics (SGLD), our joint optimization of transmission power and batch size converts channel and data noise into gradient perturbations that help mitigate overfitting. Experimental evaluations demonstrate that our framework reduces communication energy consumption by up to 77 percent and sensing costs measured by the number of collected samples by up to 52 percent while significantly improving model generalization -- with up to 58 percent reductions of the final validation loss. It validates that the proposed scheme can harvest the mutual benefit of AI and JSAC systems by incorporating the model itself into the control loop of the system.
Online Location Planning for AI-Defined Vehicles: Optimizing Joint Tasks of Order Serving and Spatio-Temporal Heterogeneous Model Fine-Tuning
Zheng, Bokeng, Rao, Bo, Zhu, Tianxiang, Tan, Chee Wei, Duan, Jingpu, Zhou, Zhi, Chen, Xu, Zhang, Xiaoxi
Abstract--Advances in artificial intelligence (AI) including foundation models (FMs), are increasingly transforming human society, with smart city driving the evolution of urban living. Meanwhile, vehicle crowdsensing (VCS) has emerged as a key enabler, leveraging vehicles' mobility and sensor-equipped capabilities. In particular, ride-hailing vehicles can effectively facilitate flexible data collection and contribute towards urban intelligence, despite resource limitations. Therefore, this work explores a promising scenario, where edge-assisted vehicles perform joint tasks of order serving and the emerging foundation model finetuning using various urban data. However, integrating the VCS AI task with the conventional order serving task is challenging, due to their inconsistent spatio-temporal characteristics: (i) The distributions of ride orders and data point-of-interests (PoIs) may not coincide in geography, both following a priori unknown patterns; (ii) they have distinct forms of temporal effects, i.e., prolonged waiting makes orders become instantly invalid while data with increased staleness gradually reduces its utility for model fine-tuning. To overcome these obstacles, we propose an online framework based on multi-agent reinforcement learning (MARL) with careful augmentation. A new quality-of-service (QoS) metric is designed to characterize and balance the utility of the two joint tasks, under the effects of varying data volumes and staleness. Each RSU, equipped with a server, stores a complete base model, enabling vehicles to perform real-time fine-tuning as they collect data and transfer the I. X. Zhang are with the School of Computer Science and A previous version appears at IWQoS 2024 as a short paper. Due to the large volume, data stored in the government agencies in better city management. Notably, ridehailing RSU server can be discarded in a certain period of time. In vehicles are particularly advantageous for VCS tasks, practice, these data can be descriptive features and feedbacks due to their centralized ride-hailing platform management, (labels) of recommendation or generative AR applications, which reduces the cost of deploying and executing crowdsensing generated by nearby visitors or residents. They can also be tasks, and utilizes the data and computing resources traffic/environment monitoring data with labels generated by from ride-hailing vehicles to maximize the VCS task utilities. The government or any company that collaborates model (FM)-powered AI applications have revolutionized with the ride-hailing vehicle company has multiple types of numerous aspects of human lives, including healthcare, education, VSC tasks to fulfill, each of which needs certain locations industry, etc. FMs, e.g., BERT, GPT-4, ViT, serve of data for fine-tuning UFMs.
Review and Recommendations for using Artificial Intelligence in Intracoronary Optical Coherence Tomography Analysis
Chen, Xu, Huang, Yuan, Jessney, Benn, Sangha, Jason, Gu, Sophie, Schรถnlieb, Carola-Bibiane, Bennett, Martin, Roberts, Michael
Artificial intelligence (AI) methodologies hold great promise for the rapid and accurate diagnosis of coronary artery disease (CAD) from intravascular optical coherent tomography (IVOCT) images. Numerous papers have been published describing AI-based models for different diagnostic tasks, yet it remains unclear which models have potential clinical utility and have been properly validated. This systematic review considered published literature between January 2015 and February 2023 describing AI-based diagnosis of CAD using IVOCT. Our search identified 5,576 studies, with 513 included after initial screening and 35 studies included in the final systematic review after quality screening. Our findings indicate that most of the identified models are not currently suitable for clinical use, primarily due to methodological flaws and underlying biases. To address these issues, we provide recommendations to improve model quality and research practices to enhance the development of clinically useful AI products.
Active Sampling for Node Attribute Completion on Graphs
Liu, Benyuan, Chen, Xu, Wang, Yanfeng, Zhang, Ya, Cao, Zhi, Tsang, Ivor
Node attribute, a type of crucial information for graph analysis, may be partially or completely missing for certain nodes in real world applications. Restoring the missing attributes is expected to benefit downstream graph learning. Few attempts have been made on node attribute completion, but a novel framework called Structure-attribute Transformer (SAT) was recently proposed by using a decoupled scheme to leverage structures and attributes. SAT ignores the differences in contributing to the learning schedule and finding a practical way to model the different importance of nodes with observed attributes is challenging. This paper proposes a novel AcTive Sampling algorithm (ATS) to restore missing node attributes. The representativeness and uncertainty of each node's information are first measured based on graph structure, representation similarity and learning bias. To select nodes as train samples in the next optimization step, a weighting scheme controlled by Beta distribution is then introduced to linearly combine the two properties. Extensive experiments on four public benchmark datasets and two downstream tasks have shown the superiority of ATS in node attribute completion.
LLM-Based Multi-Agent Systems are Scalable Graph Generative Models
Ji, Jiarui, Lei, Runlin, Bi, Jialing, Wei, Zhewei, Chen, Xu, Lin, Yankai, Pan, Xuchen, Li, Yaliang, Ding, Bolin
The structural properties of naturally arising social graphs are extensively studied to understand their evolution. Prior approaches for modeling network dynamics typically rely on rule-based models, which lack realism and generalizability, or deep learning-based models, which require large-scale training datasets. Social graphs, as abstract graph representations of entity-wise interactions, present an opportunity to explore network evolution mechanisms through realistic simulations of human-item interactions. Leveraging the pre-trained social consensus knowledge embedded in large language models (LLMs), we present GraphAgent-Generator (GAG), a novel simulation-based framework for dynamic, text-attributed social graph generation. GAG simulates the temporal node and edge generation processes for zero-shot social graph generation. The resulting graphs exhibit adherence to seven key macroscopic network properties, achieving an 11% improvement in microscopic graph structure metrics. Through the node classification benchmarking task, we validate GAG effectively captures the intricate text-structure correlations in graph generation. Furthermore, GAG supports generating graphs with up to nearly 100,000 nodes or 10 million edges through large-scale LLM-based agent simulation with parallel acceleration, achieving a minimum speed-up of 90.4%. The source code is available at https://github.com/Ji-Cather/GraphAgent.
SVFR: A Unified Framework for Generalized Video Face Restoration
Wang, Zhiyao, Chen, Xu, Xu, Chengming, Zhu, Junwei, Hu, Xiaobin, Zhang, Jiangning, Wang, Chengjie, Liu, Yuqi, Zhou, Yiyi, Ji, Rongrong
Face Restoration (FR) is a crucial area within image and video processing, focusing on reconstructing high-quality portraits from degraded inputs. Despite advancements in image FR, video FR remains relatively under-explored, primarily due to challenges related to temporal consistency, motion artifacts, and the limited availability of high-quality video data. Moreover, traditional face restoration typically prioritizes enhancing resolution and may not give as much consideration to related tasks such as facial colorization and inpainting. In this paper, we propose a novel approach for the Generalized Video Face Restoration (GVFR) task, which integrates video BFR, inpainting, and colorization tasks that we empirically show to benefit each other. We present a unified framework, termed as stable video face restoration (SVFR), which leverages the generative and motion priors of Stable Video Diffusion (SVD) and incorporates task-specific information through a unified face restoration framework. A learnable task embedding is introduced to enhance task identification. Meanwhile, a novel Unified Latent Regularization (ULR) is employed to encourage the shared feature representation learning among different subtasks. To further enhance the restoration quality and temporal stability, we introduce the facial prior learning and the self-referred refinement as auxiliary strategies used for both training and inference. The proposed framework effectively combines the complementary strengths of these tasks, enhancing temporal coherence and achieving superior restoration quality. This work advances the state-of-the-art in video FR and establishes a new paradigm for generalized video face restoration. Code and video demo are available at https://github.com/wangzhiyaoo/SVFR.git.