AITopics | Xu, Mingyu

Collaborating Authors

Xu, Mingyu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Baichuan-M1: Pushing the Medical Capability of Large Language Models

Wang, Bingning, Zhao, Haizhou, Zhou, Huozhi, Song, Liang, Xu, Mingyu, Cheng, Wei, Zeng, Xiangrong, Zhang, Yupeng, Huo, Yuqi, Wang, Zecheng, Zhao, Zhengyun, Pan, Da, Yang, Fan, Kou, Fei, Li, Fei, Chen, Fuzhong, Dong, Guosheng, Liu, Han, Zhang, Hongda, He, Jin, Yang, Jinjie, Wu, Kangxi, Wu, Kegeng, Su, Lei, Niu, Linlin, Sun, Linzhuang, Wang, Mang, Fan, Pengcheng, Shen, Qianli, Xin, Rihui, Dang, Shunya, Zhou, Songchi, Chen, Weipeng, Luo, Wenjing, Chen, Xin, Men, Xin, Lin, Xionghai, Dong, Xuezhen, Zhang, Yan, Duan, Yifei, Zhou, Yuyan, Ma, Zhi, Wu, Zhiying

arXiv.org Artificial IntelligenceFeb-18-2025

The current generation of large language models (LLMs) is typically designed for broad, general-purpose applications, while domain-specific LLMs, especially in vertical fields like medicine, remain relatively scarce. In particular, the development of highly efficient and practical LLMs for the medical domain is challenging due to the complexity of medical knowledge and the limited availability of high-quality data. To bridge this gap, we introduce Baichuan-M1, a series of large language models specifically optimized for medical applications. Unlike traditional approaches that simply continue pretraining on existing models or apply post-training to a general base model, Baichuan-M1 is trained from scratch with a dedicated focus on enhancing medical capabilities. Our model is trained on 20 trillion tokens and incorporates a range of effective training methods that strike a balance between general capabilities and medical expertise. As a result, Baichuan-M1 not only performs strongly across general domains such as mathematics and coding but also excels in specialized medical fields. We have open-sourced Baichuan-M1-14B, a mini version of our model, which can be accessed through the following links.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.12671

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.93)
Health & Medicine > Health Care Technology > Medical Record (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation

Dong, Zican, Li, Junyi, Jiang, Jinhao, Xu, Mingyu, Zhao, Wayne Xin, Wang, Bingning, Chen, Weipeng

arXiv.org Artificial IntelligenceFeb-11-2025

Large language models (LLMs) have gained extended context windows through scaling positional encodings and lightweight continual pre-training. However, this often leads to degraded performance on short-text tasks, while the reasons for this degradation remain insufficiently explored. In this work, we identify two primary factors contributing to this issue: distribution drift in hidden states and attention scores, and catastrophic forgetting during continual pre-training. To address these challenges, we propose Long Context Pre-training with Restoration Distillation (LongReD), a novel approach designed to mitigate short-text performance degradation through minimizing the distribution discrepancy between the extended and original models. Besides training on long texts, LongReD distills the hidden state of selected layers from the original model on short texts. Additionally, LongReD also introduces a short-to-long distillation, aligning the output distribution on short texts with that on long texts by leveraging skipped positional indices. Experiments on common text benchmarks demonstrate that LongReD effectively preserves the model's short-text performance while maintaining comparable or even better capacity to handle long texts than baselines.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.07365

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Hawaii (0.14)
North America > Mexico > Mexico City (0.14)

Genre:

Research Report > New Finding (0.93)
Research Report > Promising Solution (0.75)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

KV Shifting Attention Enhances Language Modeling

Xu, Mingyu, Cheng, Wei, Wang, Bingning, Chen, Weipeng

arXiv.org Artificial IntelligenceDec-5-2024

The current large language models are mainly based on decode-only structure transformers, which have great in-context learning (ICL) capabilities. It is generally believed that the important foundation of its ICL capability is the induction heads mechanism, which requires at least two layers attention. In order to more efficiently implement the ability of the model's induction, we revisit the induction heads mechanism and proposed a KV shifting attention. We theoretically prove that the KV shifting attention reducing the model's requirements for the depth and width of the induction heads mechanism. Our experimental results demonstrate that KV shifting attention is beneficial to learning induction heads and language modeling, which lead to better performance or faster convergence from toy models to the pre-training models with more than 10 B parameters.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.19574

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Base of RoPE Bounds Context Length

Men, Xin, Xu, Mingyu, Wang, Bingning, Zhang, Qingyu, Lin, Hongyu, Han, Xianpei, Chen, Weipeng

arXiv.org Artificial IntelligenceMay-23-2024

Position embedding is a core component of current Large Language Models (LLMs). Rotary position embedding (RoPE), a technique that encodes the position information with a rotation matrix, has been the de facto choice for position embedding in many LLMs, such as the Llama series. RoPE has been further utilized to extend long context capability, which is roughly based on adjusting the \textit{base} parameter of RoPE to mitigate out-of-distribution (OOD) problems in position embedding. However, in this paper, we find that LLMs may obtain a superficial long-context ability based on the OOD theory. We revisit the role of RoPE in LLMs and propose a novel property of long-term decay, we derive that the \textit{base of RoPE bounds context length}: there is an absolute lower bound for the base value to obtain certain context length capability. Our work reveals the relationship between context length and RoPE base both theoretically and empirically, which may shed light on future long context training.

context length, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2405.14591

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Multimodal Fusion with Pre-Trained Model Features in Affective Behaviour Analysis In-the-wild

Wen, Zhuofan, Zhang, Fengyu, Zhang, Siyuan, Sun, Haiyang, Xu, Mingyu, Sun, Licai, Lian, Zheng, Liu, Bin, Tao, Jianhua

arXiv.org Artificial IntelligenceMar-22-2024

Multimodal fusion is a significant method for most multimodal tasks. With the recent surge in the number of large pre-trained models, combining both multimodal fusion methods and pre-trained model features can achieve outstanding performance in many multimodal tasks. In this paper, we present our approach, which leverages both advantages for addressing the task of Expression (Expr) Recognition and Valence-Arousal (VA) Estimation. We evaluate the Aff-Wild2 database using pre-trained models, then extract the final hidden layers of the models as features. Following preprocessing and interpolation or convolution to align the extracted features, different models are employed for modal fusion. Our code is available at GitHub - FulgenceWen/ABAW6th.

artificial intelligence, dimitrio kollia, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2403.15044

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

Men, Xin, Xu, Mingyu, Zhang, Qingyu, Wang, Bingning, Lin, Hongyu, Lu, Yaojie, Han, Xianpei, Chen, Weipeng

arXiv.org Artificial IntelligenceMar-7-2024

As Large Language Models (LLMs) continue to advance in performance, their size has escalated significantly, with current LLMs containing billions or even trillions of parameters. However, in this study, we discovered that many layers of LLMs exhibit high similarity, and some layers play a negligible role in network functionality. Based on this observation, we define a metric called Block Influence (BI) to gauge the significance of each layer in LLMs. We then propose a straightforward pruning approach: layer removal, in which we directly delete the redundant layers in LLMs based on their BI scores. Experiments demonstrate that our method, which we call ShortGPT, significantly outperforms previous state-of-the-art (SOTA) methods in model pruning. Moreover, ShortGPT is orthogonal to quantization-like methods, enabling further reduction in parameters and computation. The ability to achieve better results through simple layer removal, as opposed to more complex pruning techniques, suggests a high degree of redundancy in the model architecture.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2403.03853

Country:

North America > United States (0.14)
Asia (0.14)

Genre: Research Report > New Finding (0.88)

Industry: Education > Educational Setting > K-12 Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning

Lian, Zheng, Sun, Haiyang, Sun, Licai, Chen, Kang, Xu, Mingyu, Wang, Kexin, Xu, Ke, He, Yu, Li, Ying, Zhao, Jinming, Liu, Ye, Liu, Bin, Yi, Jiangyan, Wang, Meng, Cambria, Erik, Zhao, Guoying, Schuller, Björn W., Tao, Jianhua

arXiv.org Artificial IntelligenceSep-14-2023

The first Multimodal Emotion Recognition Challenge (MER 2023) was successfully held at ACM Multimedia. The challenge focuses on system robustness and consists of three distinct tracks: (1) MER-MULTI, where participants are required to recognize both discrete and dimensional emotions; (2) MER-NOISE, in which noise is added to test videos for modality robustness evaluation; (3) MER-SEMI, which provides a large amount of unlabeled samples for semi-supervised learning. In this paper, we introduce the motivation behind this challenge, describe the benchmark dataset, and provide some statistics about participants. To continue using this dataset after MER 2023, please sign a new End User License Agreement and send it to our official email address merchallenge.contact@gmail.com. We believe this high-quality dataset can become a new benchmark in multimodal emotion recognition, especially for the Chinese research community.

artificial intelligence, machine learning, proceedings, (16 more...)

arXiv.org Artificial Intelligence

2304.08981

Country:

North America (0.70)
Asia > China (0.54)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

ALIM: Adjusting Label Importance Mechanism for Noisy Partial Label Learning

Xu, Mingyu, Lian, Zheng, Feng, Lei, Liu, Bin, Tao, Jianhua

arXiv.org Artificial IntelligenceMay-17-2023

Noisy partial label learning (noisy PLL) is an important branch of weakly supervised learning. Unlike PLL where the ground-truth label must conceal in the candidate label set, noisy PLL relaxes this constraint and allows the ground-truth label may not be in the candidate label set. To address this challenging problem, most of the existing works attempt to detect noisy samples and estimate the ground-truth label for each noisy sample. However, detection errors are unavoidable. These errors can accumulate during training and continuously affect model optimization. To this end, we propose a novel framework for noisy PLL with theoretical guarantees, called ``Adjusting Label Importance Mechanism (ALIM)''. It aims to reduce the negative impact of detection errors by trading off the initial candidate set and model outputs. ALIM is a plug-in strategy that can be integrated with existing PLL approaches. Experimental results on benchmark datasets demonstrate that our method can achieve state-of-the-art performance on noisy PLL. \textcolor[rgb]{0.93,0.0,0.47}{Our code can be found in Supplementary Material}.

artificial intelligence, ground-truth label, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2301.12077

Country: North America (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

VRA: Variational Rectified Activation for Out-of-distribution Detection

Xu, Mingyu, Lian, Zheng, Liu, Bin, Tao, Jianhua

arXiv.org Artificial IntelligenceMay-17-2023

Out-of-distribution (OOD) detection is critical to building reliable machine learning systems in the open world. Researchers have proposed various strategies to reduce model overconfidence on OOD data. Among them, ReAct is a typical and effective technique to deal with model overconfidence, which truncates high activations to increase the gap between in-distribution and OOD. Despite its promising results, is this technique the best choice? To answer this question, we leverage the variational method to find the optimal operation and verify the necessity of suppressing abnormally low and high activations and amplifying intermediate activations in OOD detection, rather than focusing only on high activations like ReAct. This motivates us to propose a novel technique called "Variational Rectified Activation (VRA)", which simulates these suppression and amplification operations using piecewise functions. Experimental results on multiple benchmark datasets demonstrate that our method outperforms existing post-hoc strategies. Meanwhile, VRA is compatible with different scoring functions and network architectures. Our code can be found in Supplementary Material.

artificial intelligence, machine learning, proceedings, (16 more...)

arXiv.org Artificial Intelligence

2302.11716

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

IRNet: Iterative Refinement Network for Noisy Partial Label Learning

Lian, Zheng, Xu, Mingyu, Chen, Lan, Sun, Licai, Liu, Bin, Tao, Jianhua

arXiv.org Artificial IntelligenceMar-8-2023

Partial label learning (PLL) is a typical weakly supervised learning, where each sample is associated with a set of candidate labels. The basic assumption of PLL is that the ground-truth label must reside in the candidate set. However, this assumption may not be satisfied due to the unprofessional judgment of the annotators, thus limiting the practical application of PLL. In this paper, we relax this assumption and focus on a more general problem, noisy PLL, where the ground-truth label may not exist in the candidate set. To address this challenging problem, we propose a novel framework called "Iterative Refinement Network (IRNet)". It aims to purify the noisy samples by two key modules, i.e., noisy sample detection and label correction. Ideally, we can convert noisy PLL into traditional PLL if all noisy samples are corrected. To guarantee the performance of these modules, we start with warm-up training and exploit data augmentation to reduce prediction errors. Through theoretical analysis, we prove that IRNet is able to reduce the noise level of the dataset and eventually approximate the Bayes optimal classifier. Experimental results on multiple benchmark datasets demonstrate the effectiveness of our method. IRNet is superior to existing state-of-the-art approaches on noisy PLL.

data mining, machine learning, noisy sample, (17 more...)

arXiv.org Artificial Intelligence

2211.04774

Country: Asia > China (0.30)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback