Cui, Zhichao
SEAL: Speech Embedding Alignment Learning for Speech Large Language Model with Retrieval-Augmented Generation
Sun, Chunyu, Liu, Bingyu, Cui, Zhichao, Qi, Anbin, Zhang, Tian-hao, Zhou, Dinghao, Lu, Lewei
Embedding-based retrieval models have made significant strides in retrieval-augmented generation (RAG) techniques for text and multimodal large language models (LLMs) applications. However, when it comes to speech larage language models (SLLMs), these methods are limited to a two-stage process, where automatic speech recognition (ASR) is combined with text-based retrieval. This sequential architecture suffers from high latency and error propagation. To address these limitations, we propose a unified embedding framework that eliminates the need for intermediate text representations. Specifically, the framework includes separate speech and text encoders, followed by a shared scaling layer that maps both modalities into a common embedding space. Our model reduces pipeline latency by 50\% while achieving higher retrieval accuracy compared to traditional two-stage methods. We also provide a theoretical analysis of the challenges inherent in end-to-end speech retrieval and introduce architectural principles for effective speech-to-document matching. Extensive experiments demonstrate the robustness of our approach across diverse acoustic conditions and speaker variations, paving the way for a new paradigm in multimodal SLLMs retrieval systems.
Autonomous and Adaptive Role Selection for Multi-robot Collaborative Area Search Based on Deep Reinforcement Learning
Zhu, Lina, Cheng, Jiyu, Zhang, Hao, Cui, Zhichao, Zhang, Wei, Liu, Yuehu
In the tasks of multi-robot collaborative area search, we propose the unified approach for simultaneous mapping for sensing more targets (exploration) while searching and locating the targets (coverage). Specifically, we implement a hierarchical multi-agent reinforcement learning algorithm to decouple task planning from task execution. The role concept is integrated into the upper-level task planning for role selection, which enables robots to learn the role based on the state status from the upper-view. Besides, an intelligent role switching mechanism enables the role selection module to function between two timesteps, promoting both exploration and coverage interchangeably. Then the primitive policy learns how to plan based on their assigned roles and local observation for sub-task execution. The well-designed experiments show the scalability and generalization of our method compared with state-of-the-art approaches in the scenes with varying complexity and number of robots.
Artificial Intelligence Security Competition (AISC)
Dong, Yinpeng, Chen, Peng, Deng, Senyou, L, Lianji, Sun, Yi, Zhao, Hanyu, Li, Jiaxing, Tan, Yunteng, Liu, Xinyu, Dong, Yangyi, Xu, Enhui, Xu, Jincai, Xu, Shu, Fu, Xuelin, Sun, Changfeng, Han, Haoliang, Zhang, Xuchong, Chen, Shen, Sun, Zhimin, Cao, Junyi, Yao, Taiping, Ding, Shouhong, Wu, Yu, Lin, Jian, Wu, Tianpeng, Wang, Ye, Fu, Yu, Feng, Lin, Gao, Kangkang, Liu, Zeyu, Pang, Yuanzhe, Duan, Chengqi, Zhou, Huipeng, Wang, Yajie, Zhao, Yuhang, Wu, Shangbo, Lyu, Haoran, Lin, Zhiyu, Gao, Yifei, Li, Shuang, Wang, Haonan, Sang, Jitao, Ma, Chen, Zheng, Junhao, Li, Yijia, Shen, Chao, Lin, Chenhao, Cui, Zhichao, Liu, Guoshuai, Shi, Huafeng, Hu, Kun, Zhang, Mengxin
The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.