AITopics | Chen, Yijie

Collaborating Authors

Chen, Yijie

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Enhancing Cross-Tokenizer Knowledge Distillation with Contextual Dynamical Mapping

Chen, Yijie, Liu, Yijin, Meng, Fandong, Chen, Yufeng, Xu, Jinan, Zhou, Jie

arXiv.org Artificial IntelligenceFeb-16-2025

Knowledge Distillation (KD) has emerged as a prominent technique for model compression. However, conventional KD approaches primarily focus on homogeneous architectures with identical tokenizers, constraining their applicability in cross-architecture scenarios. As for the cross-tokenizer KD, the differences in the tokenizers give rise to two fundamental challenges: (1) sequence misalignment caused by divergent tokenization strategies, and (2) mismatched vocabulary size and composition. While existing probability-matching methods attempt to address these issues, their efficacy remains limited due to suboptimal alignment in both the sequence and vocabulary aspects. To overcome these limitations, we propose Contextual Dynamic Mapping (CDM), a novel cross-tokenizer distillation framework that employs contextual information to enhance sequence alignment precision and dynamically improves vocabulary mapping. We evaluated the effectiveness of our approach across five advanced and widely-used model families (i.e, LLama3, Phi3, Gemma2, OPT and Qwen2), which were configured into three distinct teacher-student pairs. Our method shows significant advantages over existing cross-tokenizer distillation baselines across diverse benchmarks, including instruction-following, code generation and math. Notably, our analysis reveals that combining conventional same-tokenizer distillation and cross-tokenizer distillation through CDM yields further performance improvements. The code is available at https://github.com/pppa2019/ContexualDynamicMapping

distillation, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2502.11104

Country: Asia (0.47)

Genre: Research Report (1.00)

Industry: Education (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective

Chen, Yijie, Liu, Yijin, Meng, Fandong, Chen, Yufeng, Xu, Jinan, Zhou, Jie

arXiv.org Artificial IntelligenceApr-11-2024

Code generation aims to understand the problem description and generate corresponding code snippets, where existing works generally decompose such complex tasks into intermediate steps by prompting strategies, such as Chain-of-Thought and its variants. While these studies have achieved some success, their effectiveness is highly dependent on the capabilities of advanced Large Language Models (LLMs) such as GPT-4, particularly in terms of API calls, which significantly limits their practical applicability. Consequently, how to enhance the code generation capabilities of small and medium-scale code LLMs without significantly increasing training costs is an appealing challenge. In this paper, we suggest that code comments are the natural logic pivot between natural language and code language and propose using comments to boost the code generation ability of code LLMs. Concretely, we propose MANGO (comMents As Natural loGic pivOts), including a comment contrastive training strategy and a corresponding logical comment decoding strategy. Experiments are performed on HumanEval and MBPP, utilizing StarCoder and WizardCoder as backbone models, and encompassing model parameter sizes between 3B and 7B. The results indicate that MANGO significantly improves the code pass rate based on the strong baselines. Meanwhile, the robustness of the logical comment decoding strategy is notably higher than the Chain-of-thoughts prompting. The code is publicly available at \url{https://github.com/pppa2019/Mango}.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2404.07549

Country: Asia > China (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Improving Translation Faithfulness of Large Language Models via Augmenting Instructions

Chen, Yijie, Liu, Yijin, Meng, Fandong, Chen, Yufeng, Xu, Jinan, Zhou, Jie

arXiv.org Artificial IntelligenceAug-24-2023

Large Language Models (LLMs) present strong general capabilities, and a current compelling challenge is stimulating their specialized capabilities, such as machine translation, through low-cost instruction tuning. The standard instruction-following data is sequentially organized as the concatenation of an instruction, an input, and a response. As the attention mechanism of LLMs has limitations on local focus, LLMs tend to focus more on the words or sentences nearby at each position. This leads to a high risk of instruction forgetting during decoding. To alleviate the above issues, We propose SWIE (Segment-Weighted Instruction Embedding) and an instruction-following dataset OVERMISS. SWIE improves the model instruction understanding by adding a global instruction representation on the following input and response representations. OVERMISS improves model faithfulness by comparing over-translation and miss-translation results with the correct translation. We apply our methods to two main-stream open-source LLMs, BLOOM and LLaMA. The experimental results demonstrate significant improvements in translation performance with SWIE based on BLOOMZ-3b, particularly in zero-shot and long text translations due to reduced instruction forgetting risk. Additionally, OVERMISS outperforms the baseline in translation performance (e.g. an increase in BLEU scores from 0.69 to 3.12 and an average improvement of 0.48 percentage comet scores for LLaMA-7b) with further enhancements seen in models combining OVERMISS and SWIE (e.g. the BLUE scores increase up to 0.56 from English to German across three different backbones), and both exhibit improvements in the faithfulness metric based on word alignment.

artificial intelligence, machine translation, natural language, (15 more...)

arXiv.org Artificial Intelligence

2308.12674

Country: Asia (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback