AITopics | Li, Zehan

Collaborating Authors

Li, Zehan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Neural Networks Remember More: The Power of Parameter Isolation and Combination

Zeng, Biqing, Li, Zehan, Ayesh, Aladdin

arXiv.org Artificial IntelligenceFeb-15-2025

Catastrophic forgetting is a pervasive issue for pre-trained language models (PLMs) during continual learning, where models lose previously acquired knowledge when sequentially trained on a series of tasks. The model's ability to retain old tasks is referred to as stability, while its adaptability to new tasks is called plasticity. Therefore, the key to solving this problem is to find a trade-off between the plasticity and stability of the model. To address this issue, in this paper, we propose a novel method to achieve a balance between model stability and plasticity, thereby mitigating catastrophic forgetting. More specifically, our proposed approach leverages parameter isolation and a subsequent combination strategy. Initially, in the training stage, the model adapts to each downstream task via a parameter isolation method to prevent potential interference among different tasks. We then combine all trained parameters, which contain acquired knowledge, using the task arithmetic method and finally apply them to the backbone model. Empirical evaluations on continual language learning benchmarks substantiate the effectiveness of our approach, revealing a marked enhancement over existing state-of-the-art approaches.

computational linguistic, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2502.10966

Country:

Asia (0.94)
North America > United States (0.93)

Genre:

Research Report > New Finding (1.00)
Research Report > Promising Solution (0.69)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback

MiniMax-01: Scaling Foundation Models with Lightning Attention

MiniMax, null, Li, Aonian, Gong, Bangwei, Yang, Bo, Shan, Boji, Liu, Chang, Zhu, Cheng, Zhang, Chunhao, Guo, Congchao, Chen, Da, Li, Dong, Jiao, Enwei, Li, Gengxin, Zhang, Guojun, Sun, Haohai, Dong, Houze, Zhu, Jiadai, Zhuang, Jiaqi, Song, Jiayuan, Zhu, Jin, Han, Jingtao, Li, Jingyang, Xie, Junbin, Xu, Junhao, Yan, Junjie, Zhang, Kaishun, Xiao, Kecheng, Kang, Kexi, Han, Le, Wang, Leyang, Yu, Lianfei, Feng, Liheng, Zheng, Lin, Chai, Linbo, Xing, Long, Ju, Meizhi, Chi, Mingyuan, Zhang, Mozhi, Huang, Peikai, Niu, Pengcheng, Li, Pengfei, Zhao, Pengyu, Yang, Qi, Xu, Qidi, Wang, Qiexiang, Wang, Qin, Li, Qiuhui, Leng, Ruitao, Shi, Shengmin, Yu, Shuqi, Li, Sichen, Zhu, Songquan, Huang, Tao, Liang, Tianrun, Sun, Weigao, Sun, Weixuan, Cheng, Weiyu, Li, Wenkai, Song, Xiangjun, Su, Xiao, Han, Xiaodong, Zhang, Xinjie, Hou, Xinzhu, Min, Xu, Zou, Xun, Shen, Xuyang, Gong, Yan, Zhu, Yingjie, Zhou, Yipeng, Zhong, Yiran, Hu, Yongyi, Fan, Yuanxiang, Yu, Yue, Yang, Yufeng, Li, Yuhao, Huang, Yunan, Li, Yunji, Huang, Yunpeng, Xu, Yunzhi, Mao, Yuxin, Li, Zehan, Li, Zekang, Tao, Zewei, Ying, Zewen, Cong, Zhaoyang, Qin, Zhen, Fan, Zhenhua, Yu, Zhihang, Jiang, Zhuo, Wu, Zijia

arXiv.org Artificial IntelligenceJan-14-2025

We introduce MiniMax-01 series, including MiniMax-Text-01 and MiniMax-VL-01, which are comparable to top-tier models while offering superior capabilities in processing longer contexts. The core lies in lightning attention and its efficient scaling. To maximize computational capacity, we integrate it with Mixture of Experts (MoE), creating a model with 32 experts and 456 billion total parameters, of which 45.9 billion are activated for each token. We develop an optimized parallel strategy and highly efficient computation-communication overlap techniques for MoE and lightning attention. This approach enables us to conduct efficient training and inference on models with hundreds of billions of parameters across contexts spanning millions of tokens. The context window of MiniMax-Text-01 can reach up to 1 million tokens during training and extrapolate to 4 million tokens during inference at an affordable cost. Our vision-language model, MiniMax-VL-01 is built through continued training with 512 billion vision-language tokens. Experiments on both standard and in-house benchmarks show that our models match the performance of state-of-the-art models like GPT-4o and Claude-3.5-Sonnet while offering 20-32 times longer context window. We publicly release MiniMax-01 at https://github.com/MiniMax-AI.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.08313

Country:

Asia (0.92)
North America > United States > Massachusetts (0.14)
North America > United States > California (0.14)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.65)

Industry:

Health & Medicine (1.00)
Information Technology (0.67)
Education > Educational Setting (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Suicide Phenotyping from Clinical Notes in Safety-Net Psychiatric Hospital Using Multi-Label Classification with Pre-Trained Language Models

Li, Zehan, Hu, Yan, Lane, Scott, Selek, Salih, Shahani, Lokesh, Machado-Vieira, Rodrigo, Soares, Jair, Xu, Hua, Liu, Hongfang, Huang, Ming

arXiv.org Artificial IntelligenceOct-3-2024

Accurate identification and categorization of suicidal events can yield better suicide precautions, reducing operational burden, and improving care quality in high-acuity psychiatric settings. Pre-trained language models offer promise for identifying suicidality from unstructured clinical narratives. We evaluated the performance of four BERT-based models using two fine-tuning strategies (multiple single-label and single multi-label) for detecting coexisting suicidal events from 500 annotated psychiatric evaluation notes. The notes were labeled for suicidal ideation (SI), suicide attempts (SA), exposure to suicide (ES), and non-suicidal self-injury (NSSI). RoBERTa outperformed other models using multiple single-label classification strategy (acc=0.86, F1=0.78). MentalBERT (acc=0.83, F1=0.74) also exceeded BioClinicalBERT (acc=0.82, F1=0.72) which outperformed BERT (acc=0.80, F1=0.70). RoBERTa fine-tuned with single multi-label classification further improved the model performance (acc=0.88, F1=0.81). The findings highlight that the model optimization, pretraining with domain-relevant data, and the single multi-label classification strategy enhance the model performance of suicide phenotyping. Keywords: EHR-based Phenotyping; Natural Language Processing; Secondary Use of EHR Data; Suicide Classification; BERT-based Model; Psychiatry; Mental Health

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2409.18878

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)
Health & Medicine > Health Care Providers & Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

ProCQA: A Large-scale Community-based Programming Question Answering Dataset for Code Search

Li, Zehan, Zhang, Jianfei, Yin, Chuantao, Ouyang, Yuanxin, Rong, Wenge

arXiv.org Artificial IntelligenceMar-25-2024

Retrieval-based code question answering seeks to match user queries in natural language to relevant code snippets. Previous approaches typically rely on pretraining models using crafted bi-modal and uni-modal datasets to align text and code representations. In this paper, we introduce ProCQA, a large-scale programming question answering dataset extracted from the StackOverflow community, offering naturally structured mixed-modal QA pairs. To validate its effectiveness, we propose a modality-agnostic contrastive pre-training approach to improve the alignment of text and code representations of current code language models. Compared to previous models that primarily employ bimodal and unimodal pairs extracted from CodeSearchNet for pre-training, our model exhibits significant performance improvements across a wide range of code retrieval benchmarks.

machine learning, natural language, question answering, (17 more...)

arXiv.org Artificial Intelligence

2403.16702

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Advancing GenAI Assisted Programming--A Comparative Study on Prompt Efficiency and Code Quality Between GPT-4 and GLM-4

Yang, Angus, Li, Zehan, Li, Jie

arXiv.org Artificial IntelligenceFeb-20-2024

This study aims to explore the best practices for utilizing GenAI as a programming tool, through a comparative analysis between GPT-4 and GLM-4. By evaluating prompting strategies at different levels of complexity, we identify that simplest and straightforward prompting strategy yields best code generation results. Additionally, adding a CoT-like preliminary confirmation step would further increase the success rate. Our results reveal that while GPT-4 marginally outperforms GLM-4, the difference is minimal for average users. In our simplified evaluation model, we see a remarkable 30 to 100-fold increase in code generation efficiency over traditional coding norms. Our GenAI Coding Workshop highlights the effectiveness and accessibility of the prompting methodology developed in this study. We observe that GenAI-assisted coding would trigger a paradigm shift in programming landscape, which necessitates developers to take on new roles revolving around supervising and guiding GenAI, and to focus more on setting high-level objectives and engaging more towards innovation.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2402.12782

Country:

Europe (0.28)
Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Games (0.94)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Large Language Models in Mental Health Care: a Scoping Review

Hua, Yining, Liu, Fenglin, Yang, Kailai, Li, Zehan, Sheu, Yi-han, Zhou, Peilin, Moran, Lauren V., Ananiadou, Sophia, Beam, Andrew

arXiv.org Artificial IntelligenceJan-1-2024

Objective: The growing use of large language models (LLMs) stimulates a need for a comprehensive review of their applications and outcomes in mental health care contexts. This scoping review aims to critically analyze the existing development and applications of LLMs in mental health care, highlighting their successes and identifying their challenges and limitations in these specialized fields. Materials and Methods: A broad literature search was conducted in November 2023 using six databases (PubMed, Web of Science, Google Scholar, arXiv, medRxiv, and PsyArXiv) following the 2020 version of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. A total of 313 publications were initially identified, and after applying the study inclusion criteria, 34 publications were selected for the final review. Results: We identified diverse applications of LLMs in mental health care, including diagnosis, therapy, patient engagement enhancement, etc. Key challenges include data availability and reliability, nuanced handling of mental states, and effective evaluation methods. Despite successes in accuracy and accessibility improvement, gaps in clinical applicability and ethical considerations were evident, pointing to the need for robust data, standardized evaluations, and interdisciplinary collaboration. Conclusion: LLMs show promising potential in advancing mental health care, with applications in diagnostics, and patient support. Continued advancements depend on collaborative, multidisciplinary efforts focused on framework enhancement, rigorous dataset development, technological refinement, and ethical integration to ensure the effective and safe application of LLMs in mental health care.

computational linguistic, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2401.02984

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Text Representation Distillation via Information Bottleneck Principle

Zhang, Yanzhao, Long, Dingkun, Li, Zehan, Xie, Pengjun

arXiv.org Artificial IntelligenceNov-9-2023

Pre-trained language models (PLMs) have recently shown great success in text representation field. However, the high computational cost and high-dimensional representation of PLMs pose significant challenges for practical applications. To make models more accessible, an effective method is to distill large models into smaller representation models. In order to relieve the issue of performance degradation after distillation, we propose a novel Knowledge Distillation method called IBKD. This approach is motivated by the Information Bottleneck principle and aims to maximize the mutual information between the final representation of the teacher and student model, while simultaneously reducing the mutual information between the student model's representation and the input data. This enables the student model to preserve important learned information while avoiding unnecessary information, thus reducing the risk of over-fitting. Empirical studies on two main downstream applications of text representation (Semantic Textual Similarity and Dense Retrieval tasks) demonstrate the effectiveness of our proposed approach.

machine learning, natural language, student model, (20 more...)

arXiv.org Artificial Intelligence

2311.05472

Country:

Europe (1.00)
Asia (1.00)
North America > Canada (0.68)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Language Models are Universal Embedders

Zhang, Xin, Li, Zehan, Zhang, Yanzhao, Long, Dingkun, Xie, Pengjun, Zhang, Meishan, Zhang, Min

arXiv.org Artificial IntelligenceOct-12-2023

In the large language model (LLM) revolution, embedding is a key component of various systems. For example, it is used to retrieve knowledge or memories for LLMs, to build content moderation filters, etc. As such cases span from English to other natural or programming languages, from retrieval to classification and beyond, it is desirable to build a unified embedding model rather than dedicated ones for each scenario. In this work, we make an initial step towards this goal, demonstrating that multiple languages (both natural and programming) pre-trained transformer decoders can embed universally when finetuned on limited English data. We provide a comprehensive practice with thorough evaluations. On English MTEB, our models achieve competitive performance on different embedding tasks by minimal training data. On other benchmarks, such as multilingual classification and code search, our models (without any supervision) perform comparably to, or even surpass heavily supervised baselines and/or APIs. These results provide evidence of a promising path towards building powerful unified embedders that can be applied across tasks and languages.

large language model, machine learning, proc, (19 more...)

arXiv.org Artificial Intelligence

2310.08232

Country:

Europe (1.00)
Asia (0.67)
North America > United States > Louisiana (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Towards General Text Embeddings with Multi-stage Contrastive Learning

Li, Zehan, Zhang, Xin, Zhang, Yanzhao, Long, Dingkun, Xie, Pengjun, Zhang, Meishan

arXiv.org Artificial IntelligenceAug-6-2023

We present GTE, a general-purpose text embedding model trained with multi-stage contrastive learning. In line with recent advancements in unifying various NLP tasks into a single format, we train a unified text embedding model by employing contrastive learning over a diverse mixture of datasets from multiple sources. By significantly increasing the number of training data during both unsupervised pre-training and supervised fine-tuning stages, we achieve substantial performance gains over existing embedding models. Notably, even with a relatively modest parameter count of 110M, GTE$_\text{base}$ outperforms the black-box embedding API provided by OpenAI and even surpasses 10x larger text embedding models on the massive text embedding benchmark. Furthermore, without additional fine-tuning on each programming language individually, our model outperforms previous best code retrievers of similar size by treating code as text. In summary, our model achieves impressive results by effectively harnessing multi-stage contrastive learning, offering a powerful and efficient text embedding model with broad applicability across various NLP and code-related tasks.

computational linguistic, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2308.03281

Country:

Europe (1.00)
Asia (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Sports (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Challenging Decoder helps in Masked Auto-Encoder Pre-training for Dense Passage Retrieval

Li, Zehan, Zhang, Yanzhao, Long, Dingkun, Xie, Pengjun

arXiv.org Artificial IntelligenceMay-22-2023

Recently, various studies have been directed towards exploring dense passage retrieval techniques employing pre-trained language models, among which the masked auto-encoder (MAE) pre-training architecture has emerged as the most promising. The conventional MAE framework relies on leveraging the passage reconstruction of decoder to bolster the text representation ability of encoder, thereby enhancing the performance of resulting dense retrieval systems. Within the context of building the representation ability of the encoder through passage reconstruction of decoder, it is reasonable to postulate that a ``more demanding'' decoder will necessitate a corresponding increase in the encoder's ability. To this end, we propose a novel token importance aware masking strategy based on pointwise mutual information to intensify the challenge of the decoder. Importantly, our approach can be implemented in an unsupervised manner, without adding additional expenses to the pre-training phase. Our experiments verify that the proposed method is both effective and robust on large-scale supervised passage retrieval datasets and out-of-domain zero-shot retrieval benchmarks.

artificial intelligence, machine learning, retrieval, (17 more...)

arXiv.org Artificial Intelligence

2305.13197

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback