Song, Liang
Baichuan-M1: Pushing the Medical Capability of Large Language Models
Wang, Bingning, Zhao, Haizhou, Zhou, Huozhi, Song, Liang, Xu, Mingyu, Cheng, Wei, Zeng, Xiangrong, Zhang, Yupeng, Huo, Yuqi, Wang, Zecheng, Zhao, Zhengyun, Pan, Da, Yang, Fan, Kou, Fei, Li, Fei, Chen, Fuzhong, Dong, Guosheng, Liu, Han, Zhang, Hongda, He, Jin, Yang, Jinjie, Wu, Kangxi, Wu, Kegeng, Su, Lei, Niu, Linlin, Sun, Linzhuang, Wang, Mang, Fan, Pengcheng, Shen, Qianli, Xin, Rihui, Dang, Shunya, Zhou, Songchi, Chen, Weipeng, Luo, Wenjing, Chen, Xin, Men, Xin, Lin, Xionghai, Dong, Xuezhen, Zhang, Yan, Duan, Yifei, Zhou, Yuyan, Ma, Zhi, Wu, Zhiying
The current generation of large language models (LLMs) is typically designed for broad, general-purpose applications, while domain-specific LLMs, especially in vertical fields like medicine, remain relatively scarce. In particular, the development of highly efficient and practical LLMs for the medical domain is challenging due to the complexity of medical knowledge and the limited availability of high-quality data. To bridge this gap, we introduce Baichuan-M1, a series of large language models specifically optimized for medical applications. Unlike traditional approaches that simply continue pretraining on existing models or apply post-training to a general base model, Baichuan-M1 is trained from scratch with a dedicated focus on enhancing medical capabilities. Our model is trained on 20 trillion tokens and incorporates a range of effective training methods that strike a balance between general capabilities and medical expertise. As a result, Baichuan-M1 not only performs strongly across general domains such as mathematics and coding but also excels in specialized medical fields. We have open-sourced Baichuan-M1-14B, a mini version of our model, which can be accessed through the following links.
A Survey on Diffusion Models for Anomaly Detection
Liu, Jing, Ma, Zhenchao, Wang, Zepu, Zou, Chenxuanyin, Ren, Jiayang, Wang, Zehua, Song, Liang, Hu, Bo, Liu, Yang, Leung, Victor C. M.
Diffusion models (DMs) have emerged as a powerful class of generative AI models, showing remarkable potential in anomaly detection (AD) tasks across various domains, such as cybersecurity, fraud detection, healthcare, and manufacturing. The intersection of these two fields, termed diffusion models for anomaly detection (DMAD), offers promising solutions for identifying deviations in increasingly complex and high-dimensional data. In this survey, we review recent advances in DMAD research. We begin by presenting the fundamental concepts of AD and DMs, followed by a comprehensive analysis of classic DM architectures including DDPMs, DDIMs, and Score SDEs. We further categorize existing DMAD methods into reconstruction-based, density-based, and hybrid approaches, providing detailed examinations of their methodological innovations. We also explore the diverse tasks across different data modalities, encompassing image, time series, video, and multimodal data analysis. Furthermore, we discuss critical challenges and emerging research directions, including computational efficiency, model interpretability, robustness enhancement, edge-cloud collaboration, and integration with large language models. The collection of DMAD research papers and resources is available at https://github.com/fdjingliu/DMAD.
Extracting and Transferring Abilities For Building Multi-lingual Ability-enhanced Large Language Models
Chen, Zhipeng, Song, Liang, Zhou, Kun, Zhao, Wayne Xin, Wang, Bingning, Chen, Weipeng, Wen, Ji-Rong
Multi-lingual ability transfer has become increasingly important for the broad application of large language models (LLMs). Existing work highly relies on training with the multi-lingual ability-related data, which may be not available for low-resource languages. To solve it, we propose a Multi-lingual Ability Extraction and Transfer approach, named as MAET. Our key idea is to decompose and extract language-agnostic ability-related weights from LLMs, and transfer them across different languages by simple addition and subtraction operations without training. Specially, our MAET consists of the extraction and transfer stages. In the extraction stage, we firstly locate key neurons that are highly related to specific abilities, and then employ them to extract the transferable ability-specific weights. In the transfer stage, we further select the ability-related parameter tensors, and design the merging strategy based on the linguistic and ability specific weights, to build the multi-lingual ability-enhanced LLM. To demonstrate the effectiveness of our proposed approach, we conduct extensive experiments on mathematical and scientific tasks in both high-resource lingual and low-resource lingual scenarios. Experiment results have shown that MAET can effectively and efficiently extract and transfer the advanced abilities, and outperform training-based baseline methods. Our code and data are available at \url{https://github.com/RUCAIBox/MAET}.
End-Cloud Collaboration Framework for Advanced AI Customer Service in E-commerce
Teng, Liangyu, Liu, Yang, Liu, Jing, Song, Liang
In recent years, the e-commerce industry has seen a rapid increase in the demand for advanced AI-driven customer service solutions. Traditional cloud-based models face limitations in terms of latency, personalized services, and privacy concerns. Furthermore, end devices often lack the computational resources to deploy large AI models effectively. In this paper, we propose an innovative End-Cloud Collaboration (ECC) framework for advanced AI customer service in e-commerce. This framework integrates the advantages of large cloud models and mid/small-sized end models by deeply exploring the generalization potential of cloud models and effectively utilizing the computing power resources of terminal chips, alleviating the strain on computing resources to some extent. Specifically, the large cloud model acts as a teacher, guiding and promoting the learning of the end model, which significantly reduces the end model's reliance on large-scale, high-quality data and thereby addresses the data bottleneck in traditional end model training, offering a new paradigm for the rapid deployment of industry applications. Additionally, we introduce an online evolutive learning strategy that enables the end model to continuously iterate and upgrade based on guidance from the cloud model and real-time user feedback. This strategy ensures that the model can flexibly adapt to the rapid changes in application scenarios while avoiding the uploading of sensitive information by performing local fine-tuning, achieving the dual goals of privacy protection and personalized service. %We make systematic contributions to the customized model fine-tuning methods in the e-commerce domain. To conclude, we implement in-depth corpus collection (e.g., data organization, cleaning, and preprocessing) and train an ECC-based industry-specific model for e-commerce customer service.
MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic
Zhou, Yuyan, Song, Liang, Wang, Bingning, Chen, Weipeng
The advent of large language models (LLMs) like GPT-4 has catalyzed the exploration of multi-task learning (MTL), in which a single model demonstrates proficiency across diverse tasks. Task arithmetic has emerged as a cost-effective approach for MTL. It enables performance enhancement across multiple tasks by adding their corresponding task vectors to a pre-trained model. However, the current lack of a method that can simultaneously achieve optimal performance, computational efficiency, and data privacy limits their application to LLMs. In this paper, we propose \textbf{M}odel \textbf{E}xclusive \textbf{T}ask \textbf{A}rithmetic for merging \textbf{GPT}-scale models, which formalizes the objective of model merging into a multi-task learning framework, aiming to minimize the average loss difference between the merged model and each individual task model. Since data privacy limits the use of multi-task training data, we leverage LLMs' local linearity and task vectors' orthogonality to separate the data term and scaling coefficients term and derive a model-exclusive task arithmetic method. Our proposed MetaGPT is data-agnostic and bypasses the heavy search process, making it cost-effective and easy to implement for LLMs.Extensive experiments demonstrate that MetaGPT leads to improvements in task arithmetic and achieves state-of-the-art performance on multiple tasks.
Networking Systems for Video Anomaly Detection: A Tutorial and Survey
Liu, Jing, Liu, Yang, Lin, Jieyu, Li, Jielin, Sun, Peng, Hu, Bo, Song, Liang, Boukerche, Azzedine, Leung, Victor C. M.
With the widespread use of surveillance cameras in smart cities [104] and the boom of online video applications powered by 4/5G communication technologies, traditional human inspection is no longer able to accurately monitor the video data generated around the clock, which is not only time-consuming and labor-intensive but also poses the risk of leaking important information (e.g., biometrics and sensitive speech). In contrast, VAD-empowered IoVT applications [54], such as Intelligent Surveillance Systems (IVSS) and automated content analysis platforms, can process massive video streams online and detect events of interest in real-time, sending only noteworthy anomaly parts for human review, significantly reducing data storage and communication costs, and helping to eliminate public concerns about data security and privacy protection. As a result, VAD has gained widespread attention in academia and industry over the last decade and has been used in emerging fields such as information forensics [154], industrial manufacturing [71] in smart cities as well as online content analysis in mobile video applications [153]. VAD extends the data scope of conventional Anomaly Detection (AD) from time series, images, and graphs to video, which not only needs to cope with the endogenous data complexity, but also needs to take into account the computational and communication costs in resource-limited devices [55]. Specifically, the inherent high-dimensional structure of video data, high information density and redundancy, heterogeneity of temporal and spatial patterns, and feature entanglement between foreground targets and background scenes make VAD more challenging than traditional AD tasks at the levels of representation learning and anomaly discrimination [89]. Existing studies [4, 60, 69, 76] have shown that high-performance VAD models need to target the modeling of appearance and motion information, i.e., the difference between regular events and anomalous examples in both spatial and temporal dimensions. In contrast to time series AD that mainly measures periodic temporal patterns of variables, and image AD which only focusing on spatial contextual deviations, VAD needs to extract both discriminative spatial and temporal features from a large amount of redundant information (e.g., repetitive temporal contexts and label-independent data distributions), as well as to learn the differences between normal and anomalous events in terms of their local appearances and global motions [100]. However, video anomalies are ambiguous and subjective [48].
SNI-SLAM: Semantic Neural Implicit SLAM
Zhu, Siting, Wang, Guangming, Blum, Hermann, Liu, Jiuming, Song, Liang, Pollefeys, Marc, Wang, Hesheng
We propose SNI-SLAM, a semantic SLAM system utilizing neural implicit representation, that simultaneously performs accurate semantic mapping, high-quality surface reconstruction, and robust camera tracking. In this system, we introduce hierarchical semantic representation to allow multi-level semantic comprehension for top-down structured semantic mapping of the scene. In addition, to fully utilize the correlation between multiple attributes of the environment, we integrate appearance, geometry and semantic features through cross-attention for feature collaboration. This strategy enables a more multifaceted understanding of the environment, thereby allowing SNI-SLAM to remain robust even when single attribute is defective. Then, we design an internal fusion-based decoder to obtain semantic, RGB, Truncated Signed Distance Field (TSDF) values from multi-level features for accurate decoding. Furthermore, we propose a feature loss to update the scene representation at the feature level. Compared with low-level losses such as RGB loss and depth loss, our feature loss is capable of guiding the network optimization on a higher-level. Our SNI-SLAM method demonstrates superior performance over all recent NeRF-based SLAM methods in terms of mapping and tracking accuracy on Replica and ScanNet datasets, while also showing excellent capabilities in accurate semantic segmentation and real-time semantic mapping.
Baichuan 2: Open Large-scale Language Models
Yang, Aiyuan, Xiao, Bin, Wang, Bingning, Zhang, Borong, Bian, Ce, Yin, Chao, Lv, Chenxu, Pan, Da, Wang, Dian, Yan, Dong, Yang, Fan, Deng, Fei, Wang, Feng, Liu, Feng, Ai, Guangwei, Dong, Guosheng, Zhao, Haizhou, Xu, Hang, Sun, Haoze, Zhang, Hongda, Liu, Hui, Ji, Jiaming, Xie, Jian, Dai, JunTao, Fang, Kun, Su, Lei, Song, Liang, Liu, Lifeng, Ru, Liyun, Ma, Luyao, Wang, Mang, Liu, Mickel, Lin, MingAn, Nie, Nuolan, Guo, Peidong, Sun, Ruiyang, Zhang, Tao, Li, Tianpeng, Li, Tianyu, Cheng, Wei, Chen, Weipeng, Zeng, Xiangrong, Wang, Xiaochuan, Chen, Xiaoxi, Men, Xin, Yu, Xin, Pan, Xuehai, Shen, Yanjun, Wang, Yiding, Li, Yiyu, Jiang, Youxin, Gao, Yuchen, Zhang, Yupeng, Zhou, Zenan, Wu, Zhiying
Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering. However, most powerful LLMs are closed-source or limited in their capability for languages other than English. In this technical report, we present Baichuan 2, a series of large-scale multilingual language models containing 7 billion and 13 billion parameters, trained from scratch, on 2.6 trillion tokens. Baichuan 2 matches or outperforms other open-source models of similar size on public benchmarks like MMLU, CMMLU, GSM8K, and HumanEval. Furthermore, Baichuan 2 excels in vertical domains such as medicine and law. We will release all pre-training model checkpoints to benefit the research community in better understanding the training dynamics of Baichuan 2.