Goto

Collaborating Authors

 Zhang, Zhicheng


M3HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality

arXiv.org Artificial Intelligence

Designing effective reward functions in multi-agent reinforcement learning (MARL) is a significant challenge, often leading to suboptimal or misaligned behaviors in complex, coordinated environments. We introduce Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality (M3HF), a novel framework that integrates multi-phase human feedback of mixed quality into the MARL training process. By involving humans with diverse expertise levels to provide iterative guidance, M3HF leverages both expert and non-expert feedback to continuously refine agents' policies. During training, we strategically pause agent learning for human evaluation, parse feedback using large language models to assign it appropriately and update reward functions through predefined templates and adaptive weight by using weight decay and performance-based adjustments. Our approach enables the integration of nuanced human insights across various levels of quality, enhancing the interpretability and robustness of multi-agent cooperation. Empirical results in challenging environments demonstrate that M3HF significantly outperforms state-of-the-art methods, effectively addressing the complexities of reward design in MARL and enabling broader human participation in the training process.


DP and QP Based Decision-making and Planning for Autonomous Vehicle

arXiv.org Artificial Intelligence

Autonomous driving technology is rapidly evolving and becoming a pivotal element of modern automation systems. Effective decision-making and planning are essential to ensuring autonomous vehicles operate safely and efficiently in complex environments. This paper introduces a decision-making and planning framework for autonomous vehicles, leveraging dynamic programming (DP) for global path planning and quadratic programming (QP) for local trajectory optimization. The proposed approach utilizes S-T graphs to achieve both dynamic and static obstacle avoidance. A comprehensive vehicle dynamics model supports the control system, enabling precise path tracking and obstacle handling. Simulation studies are conducted to evaluate the system's performance in a variety of scenarios, including global path planning, static obstacle avoidance, and dynamic obstacle avoidance involving pedestrian interactions. The results confirm the effectiveness and robustness of the proposed decision-making and planning algorithms in navigating complex environments, demonstrating the feasibility of this approach for autonomous driving applications.


Perception Compressor:A training-free prompt compression method in long context scenarios

arXiv.org Artificial Intelligence

Large Language Models (LLMs) demonstrate exceptional capabilities in various scenarios. However, they suffer from much redundant information and are sensitive to the position of key information (relevant to the input question) in long context scenarios, leading to inferior performance. To address these challenges, we present Perception Compressor, a training-free prompt compression method. It includes a perception retriever that leverages guiding questions and instruction to retrieve the most relevant demonstrations, a dual-slope ratio allocator to dynamically allocate compression ratios and open-book ratios, and a semi-guided iterative compression that retains key information at the token level while removing tokens that distract the LLM. We conduct extensive experiments on long context benchmarks, i.e., NaturalQuestions, LongBench, and MuSiQue. Experiment results show that Perception Compressor outperforms existing methods by a large margin, achieving state-of-the-art performance.


Flaming-hot Initiation with Regular Execution Sampling for Large Language Models

arXiv.org Artificial Intelligence

Since the release of ChatGPT, large language models (LLMs) have demonstrated remarkable capabilities across various domains. A key challenge in developing these general capabilities is efficiently sourcing diverse, high-quality data. This becomes especially critical in reasoning-related tasks with sandbox checkers, such as math or code, where the goal is to generate correct solutions to specific problems with higher probability. In this work, we introduce Flaming-hot Initiation with Regular Execution (FIRE) sampling, a simple yet highly effective method to efficiently find good responses. Our empirical findings show that FIRE sampling enhances inference-time generation quality and also benefits training in the alignment stage. Furthermore, we explore how FIRE sampling improves performance by promoting diversity and analyze the impact of employing FIRE at different positions within a response.


MFF-EINV2: Multi-scale Feature Fusion across Spectral-Spatial-Temporal Domains for Sound Event Localization and Detection

arXiv.org Artificial Intelligence

Sound Event Localization and Detection (SELD) involves detecting and localizing sound events using multichannel sound recordings. Previously proposed Event-Independent Network V2 (EINV2) has achieved outstanding performance on SELD. However, it still faces challenges in effectively extracting features across spectral, spatial, and temporal domains. This paper proposes a three-stage network structure named Multi-scale Feature Fusion (MFF) module to fully extract multi-scale features across spectral, spatial, and temporal domains. The MFF module utilizes parallel subnetworks architecture to generate multi-scale spectral and spatial features. The TF-Convolution Module is employed to provide multi-scale temporal features. We incorporated MFF into EINV2 and term the proposed method as MFF-EINV2. Experimental results in 2022 and 2023 DCASE challenge task3 datasets show the effectiveness of our MFF-EINV2, which achieves state-of-the-art (SOTA) performance compared to published methods.


MESA: Cooperative Meta-Exploration in Multi-Agent Learning through Exploiting State-Action Space Structure

arXiv.org Artificial Intelligence

Multi-agent reinforcement learning (MARL) algorithms often struggle to find strategies close to Pareto optimal Nash Equilibrium, owing largely to the lack of efficient exploration. The problem is exacerbated in sparse-reward settings, caused by the larger variance exhibited in policy learning. This paper introduces MESA, a novel meta-exploration method for cooperative multi-agent learning. It learns to explore by first identifying the agents' high-rewarding joint state-action subspace from training tasks and then learning a set of diverse exploration policies to "cover" the subspace. These trained exploration policies can be integrated with any off-policy MARL algorithm for test-time tasks. We first showcase MESA's advantage in a multi-step matrix game. Furthermore, experiments show that with learned exploration policies, MESA achieves significantly better performance in sparse-reward tasks in several multi-agent particle environments and multi-agent MuJoCo environments, and exhibits the ability to generalize to more challenging tasks at test time.


AdvLoRA: Adversarial Low-Rank Adaptation of Vision-Language Models

arXiv.org Artificial Intelligence

Vision-Language Models (VLMs) are a significant technique for Artificial General Intelligence (AGI). With the fast growth of AGI, the security problem become one of the most important challenges for VLMs. In this paper, through extensive experiments, we demonstrate the vulnerability of the conventional adaptation methods for VLMs, which may bring significant security risks. In addition, as the size of the VLMs increases, performing conventional adversarial adaptation techniques on VLMs results in high computational costs. To solve these problems, we propose a parameter-efficient \underline{Adv}ersarial adaptation method named \underline{AdvLoRA} by \underline{Lo}w-\underline{R}ank \underline{A}daptation. At first, we investigate and reveal the intrinsic low-rank property during the adversarial adaptation for VLMs. Different from LoRA, we improve the efficiency and robustness of adversarial adaptation by designing a novel reparameterizing method based on parameter clustering and parameter alignment. In addition, an adaptive parameter update strategy is proposed to further improve the robustness. By these settings, our proposed AdvLoRA alleviates the model security and high resource waste problems. Extensive experiments demonstrate the effectiveness and efficiency of the AdvLoRA.


Autonomous Path Planning for Intercostal Robotic Ultrasound Imaging Using Reinforcement Learning

arXiv.org Artificial Intelligence

Ultrasound (US) has been widely used in daily clinical practice for screening internal organs and guiding interventions. However, due to the acoustic shadow cast by the subcutaneous rib cage, the US examination for thoracic application is still challenging. To fully cover and reconstruct the region of interest in US for diagnosis, an intercostal scanning path is necessary. To tackle this challenge, we present a reinforcement learning (RL) approach for planning scanning paths between ribs to monitor changes in lesions on internal organs, such as the liver and heart, which are covered by rib cages. Structured anatomical information of the human skeleton is crucial for planning these intercostal paths. To obtain such anatomical insight, an RL agent is trained in a virtual environment constructed using computational tomography (CT) templates with randomly initialized tumors of various shapes and locations. In addition, task-specific state representation and reward functions are introduced to ensure the convergence of the training process while minimizing the effects of acoustic attenuation and shadows during scanning. To validate the effectiveness of the proposed approach, experiments have been carried out on unseen CTs with randomly defined single or multiple scanning targets. The results demonstrate the efficiency of the proposed RL framework in planning non-shadowed US scanning trajectories in areas with limited acoustic access.


Understanding AI Cognition: A Neural Module for Inference Inspired by Human Memory Mechanisms

arXiv.org Artificial Intelligence

How humans and machines make sense of current inputs for relation reasoning and question-answering while putting the perceived information into context of our past memories, has been a challenging conundrum in cognitive science and artificial intelligence. Inspired by human brain's memory system and cognitive architectures, we propose a PMI framework that consists of perception, memory and inference components. Notably, the memory module comprises working and long-term memory, with the latter endowed with a higher-order structure to retain more accumulated knowledge and experiences. Through a differentiable competitive write access, current perceptions update working memory, which is later merged with long-term memory via outer product associations, averting memory overflow and minimizing information conflicts. In the inference module, relevant information is retrieved from two separate memory origins and associatively integrated to attain a more comprehensive and precise interpretation of current perceptions. We exploratively apply our PMI to improve prevailing Transformers and CNN models on question-answering tasks like bAbI-20k and Sort-of-CLEVR datasets, as well as relation calculation and image classification tasks, and in each case, our PMI enhancements consistently outshine their original counterparts significantly. Visualization analyses reveal that memory consolidation, along with the interaction and integration of information from diverse memory sources, substantially contributes to the model effectiveness on inference tasks.


ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models

arXiv.org Artificial Intelligence

Large language models (LLMs) have recently demonstrated remarkable capabilities to comprehend human intentions, engage in reasoning, and design planning-like behavior. To further unleash the power of LLMs to accomplish complex tasks, there is a growing trend to build agent framework that equips LLMs, such as ChatGPT, with tool-use abilities to connect with massive external APIs. In this work, we introduce ModelScope-Agent, a general and customizable agent framework for real-world applications, based on open-source LLMs as controllers. It provides a user-friendly system library, with customizable engine design to support model training on multiple open-source LLMs, while also enabling seamless integration with both model APIs and common APIs in a unified way. To equip the LLMs with tool-use abilities, a comprehensive framework has been proposed spanning over tool-use data collection, tool retrieval, tool registration, memory control, customized model training, and evaluation for practical real-world applications. Finally, we showcase ModelScopeGPT, a real-world intelligent assistant of ModelScope Community based on the ModelScope-Agent framework, which is able to connect open-source LLMs with more than 1000 public AI models and localized community knowledge in ModelScope. The ModelScope-Agent library\footnote{https://github.com/modelscope/modelscope-agent} and online demo\footnote{https://modelscope.cn/studios/damo/ModelScopeGPT/summary} are now publicly available.