AITopics | Sun, Zengkui

Collaborating Authors

Sun, Zengkui

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Warmup-Distill: Bridge the Distribution Mismatch between Teacher and Student before Knowledge Distillation

Sun, Zengkui, Liu, Yijin, Meng, Fandong, Chen, Yufeng, Xu, Jinan, Zhou, Jie

arXiv.org Artificial IntelligenceFeb-17-2025

The widespread deployment of Large Language Models (LLMs) is hindered by the high computational demands, making knowledge distillation (KD) crucial for developing compact smaller ones. However, the conventional KD methods endure the distribution mismatch issue between the teacher and student models, leading to the poor performance of distillation. For instance, the widely-used KL-based methods suffer the mode-averaging and mode-collapsing problems, since the mismatched probabitliy distribution between both models. Previous studies mainly optimize this issue via different distance calculations towards the distribution of both models. Unfortunately, the distribution mismatch issue still exists in the early stage of the distillation. Hence, to reduce the impact of distribution mismatch, we propose a simple yet efficient method, named Warmup-Distill, which aligns the distillation of the student to that of the teacher in advance of distillation. Specifically, we first detect the distribution of the student model in practical scenarios with its internal knowledge, and then modify the knowledge with low probability via the teacher as the checker. Consequently, Warmup-Distill aligns the internal student's knowledge to that of the teacher, which expands the distribution of the student with the teacher's, and assists the student model to learn better in the subsequent distillation. Experiments on the seven benchmarks demonstrate that Warmup-Distill could provide a warmup student more suitable for distillation, which outperforms the vanilla student by as least +0.4 averaged score among all benchmarks. Noteably, with the assistance of Warmup-Distill, the distillation on the math task could yield a further improvement, at most +1.9% accuracy.

knowledge distillation, large language model, natural language, (5 more...)

arXiv.org Artificial Intelligence

2502.11766

Genre: Research Report (0.69)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)

Add feedback

Outdated Issue Aware Decoding for Reasoning Questions on Edited Knowledge

Sun, Zengkui, Liu, Yijin, Wang, Jiaan, Meng, Fandong, Xu, Jinan, Chen, Yufeng, Zhou, Jie

arXiv.org Artificial IntelligenceJun-16-2024

Recently, Knowledge Editing has received increasing attention, since it could update the specific knowledge from outdated ones in pretrained models without re-training. However, as pointed out by recent studies, existing related methods tend to merely memorize the superficial word composition of the edited knowledge, rather than truly learning and absorbing it. Consequently, on the reasoning questions, we discover that existing methods struggle to utilize the edited knowledge to reason the new answer, and tend to retain outdated responses, which are generated by the original models utilizing original knowledge. Nevertheless, the outdated responses are unexpected for the correct answers to reasoning questions, which we named as the outdated issue. To alleviate this issue, in this paper, we propose a simple yet effective decoding strategy, i.e., outDated ISsue aware deCOding (DISCO), to enhance the performance of edited models on reasoning questions. Specifically, we capture the difference in the probability distribution between the original and edited models. Further, we amplify the difference of the token prediction in the edited model to alleviate the outdated issue, and thus enhance the model performance w.r.t the edited knowledge. Experimental results suggest that applying DISCO could enhance edited models to reason, e.g., on reasoning questions, DISCO outperforms the prior SOTA method by 12.99 F1 scores, and reduces the ratio of the outdated issue to 5.78% on the zsRE dataset.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.02882

Country:

Asia > China (0.14)
Europe > Belgium (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.33)

Add feedback

LCS: A Language Converter Strategy for Zero-Shot Neural Machine Translation

Sun, Zengkui, Liu, Yijin, Meng, Fandong, Xu, Jinan, Chen, Yufeng, Zhou, Jie

arXiv.org Artificial IntelligenceJun-5-2024

Multilingual neural machine translation models generally distinguish translation directions by the language tag (LT) in front of the source or target sentences. However, current LT strategies cannot indicate the desired target language as expected on zero-shot translation, i.e., the off-target issue. Our analysis reveals that the indication of the target language is sensitive to the placement of the target LT. For example, when placing the target LT on the decoder side, the indication would rapidly degrade along with decoding steps, while placing the target LT on the encoder side would lead to copying or paraphrasing the source input. To address the above issues, we propose a simple yet effective strategy named Language Converter Strategy (LCS). By introducing the target language embedding into the top encoder layers, LCS mitigates confusion in the encoder and ensures stable language indication for the decoder. Experimental results on MultiUN, TED, and OPUS-100 datasets demonstrate that LCS could significantly mitigate the off-target issue, with language accuracy up to 95.28%, 96.21%, and 85.35% meanwhile outperforming the vanilla LT strategy by 3.07, 3,3, and 7.93 BLEU scores on zero-shot translation, respectively.

artificial intelligence, natural language, translation, (18 more...)

arXiv.org Artificial Intelligence

2406.02876

Country:

Europe (1.00)
North America > United States > Texas (0.14)
North America > United States > Louisiana (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Is ChatGPT a Good NLG Evaluator? A Preliminary Study

Wang, Jiaan, Liang, Yunlong, Meng, Fandong, Sun, Zengkui, Shi, Haoxiang, Li, Zhixu, Xu, Jinan, Qu, Jianfeng, Zhou, Jie

arXiv.org Artificial IntelligenceOct-24-2023

Recently, the emergence of ChatGPT has attracted wide attention from the computational linguistics community. Many prior studies have shown that ChatGPT achieves remarkable performance on various NLP tasks in terms of automatic evaluation metrics. However, the ability of ChatGPT to serve as an evaluation metric is still underexplored. Considering assessing the quality of natural language generation (NLG) models is an arduous task and NLG metrics notoriously show their poor correlation with human judgments, we wonder whether ChatGPT is a good NLG evaluation metric. In this report, we provide a preliminary meta-evaluation on ChatGPT to show its reliability as an NLG metric. In detail, we regard ChatGPT as a human evaluator and give task-specific (e.g., summarization) and aspect-specific (e.g., relevance) instruction to prompt ChatGPT to evaluate the generated results of NLG models. We conduct experiments on five NLG meta-evaluation datasets (including summarization, story generation and data-to-text tasks). Experimental results show that compared with previous automatic metrics, ChatGPT achieves state-of-the-art or competitive correlation with human judgments in most cases. In addition, we find that the effectiveness of the ChatGPT evaluator might be influenced by the creation method of the meta-evaluation datasets. For the meta-evaluation datasets which are created greatly depending on the reference and thus are biased, the ChatGPT evaluator might lose its effectiveness. We hope our preliminary study could prompt the emergence of a general-purposed reliable NLG metric.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2303.04048

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Cross-Lingual Knowledge Editing in Large Language Models

Wang, Jiaan, Liang, Yunlong, Sun, Zengkui, Cao, Yuxuan, Xu, Jiarong

arXiv.org Artificial IntelligenceSep-16-2023

Knowledge editing aims to change language models' performance on several special cases (i.e., editing scope) by infusing the corresponding expected knowledge into them. With the recent advancements in large language models (LLMs), knowledge editing has been shown as a promising technique to adapt LLMs to new knowledge without retraining from scratch. However, most of the previous studies neglect the multi-lingual nature of some main-stream LLMs (e.g., LLaMA, ChatGPT and GPT-4), and typically focus on monolingual scenarios, where LLMs are edited and evaluated in the same language. As a result, it is still unknown the effect of source language editing on a different target language. In this paper, we aim to figure out this cross-lingual effect in knowledge editing. Specifically, we first collect a large-scale cross-lingual synthetic dataset by translating ZsRE from English to Chinese. Then, we conduct English editing on various knowledge editing methods covering different paradigms, and evaluate their performance in Chinese, and vice versa. To give deeper analyses of the cross-lingual effect, the evaluation includes four aspects, i.e., reliability, generality, locality and portability. Furthermore, we analyze the inconsistent behaviors of the edited models and discuss their specific challenges.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2309.08952

Country:

North America > United States (0.14)
Europe > Belgium (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback