AITopics | Meng, Yu

Collaborating Authors

Meng, Yu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PLPHP: Per-Layer Per-Head Vision Token Pruning for Efficient Large Vision-Language Models

Meng, Yu, Li, Kaiyuan, Huang, Chenran, Gao, Chen, Chen, Xinlei, Li, Yong, Zhang, Xiaoping

arXiv.org Artificial IntelligenceFeb-20-2025

Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a range of multimodal tasks. However, their inference efficiency is constrained by the large number of visual tokens processed during decoding. To address this challenge, we propose Per-Layer Per-Head Vision Token Pruning (PLPHP), a two-level fine-grained pruning method including Layer-Level Retention Rate Allocation and Head-Level Vision Token Pruning. Motivated by the Vision Token Re-attention phenomenon across decoder layers, we dynamically adjust token retention rates layer by layer. Layers that exhibit stronger attention to visual information preserve more vision tokens, while layers with lower vision attention are aggressively pruned. Furthermore, PLPHP applies pruning at the attention head level, enabling different heads within the same layer to independently retain critical context. Experiments on multiple benchmarks demonstrate that PLPHP delivers an 18% faster decoding speed and reduces the Key-Value Cache (KV Cache) size by over 50%, all at the cost of 0.46% average performance drop, while also achieving notable performance improvements in multi-image tasks. These results highlight the effectiveness of fine-grained token pruning and contribute to advancing the efficiency and scalability of LVLMs. Our source code will be made publicly available.

artificial intelligence, natural language, vision token, (16 more...)

arXiv.org Artificial Intelligence

2502.14504

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

LLM Alignment as Retriever Optimization: An Information Retrieval Perspective

Jin, Bowen, Yoon, Jinsung, Qin, Zhen, Wang, Ziqi, Xiong, Wei, Meng, Yu, Han, Jiawei, Arik, Sercan O.

arXiv.org Artificial IntelligenceFeb-5-2025

Large Language Models (LLMs) have revolutionized artificial intelligence with capabilities in reasoning, coding, and communication, driving innovation across industries. Their true potential depends on effective alignment to ensure correct, trustworthy and ethical behavior, addressing challenges like misinformation, hallucinations, bias and misuse. While existing Reinforcement Learning (RL)-based alignment methods are notoriously complex, direct optimization approaches offer a simpler alternative. In this work, we introduce a novel direct optimization approach for LLM alignment by drawing on established Information Retrieval (IR) principles. We present a systematic framework that bridges LLM alignment and IR methodologies, mapping LLM generation and reward models to IR's retriever-reranker paradigm. Building on this foundation, we propose LLM Alignment as Retriever Preference Optimization (LarPO), a new alignment method that enhances overall alignment quality. Extensive experiments validate LarPO's effectiveness with 38.9 % and 13.7 % averaged improvement on AlpacaEval2 and MixEval-Hard respectively. Our work opens new avenues for advancing LLM alignment by integrating IR foundations, offering a promising direction for future research.

arxiv preprint arxiv, large language model, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2502.03699

Country: North America > United States (0.14)

Genre:

Research Report (1.00)
Overview (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization

Guo, Jingming, Liu, Yan, Meng, Yu, Tao, Zhiwei, Liu, Banglan, Chen, Gang, Li, Xiang

arXiv.org Artificial IntelligenceNov-1-2024

The Mixture of Experts (MoE) is an advanced model architecture in the industry that combines multiple specialized expert models from various domains into a single supermodel. This approach enables the model to scale without significantly increasing the computational costs of training and inference, while maximizing model performance. However, current distributed training frameworks do not consider the ultimate optimization of communication, especially for large base models. This paper proposes a network-traffic-aware parallel optimization method that selects the optimal parallel strategy based on the communication volume, and the training cluster's inter-node and intra-node network topologies. Compared to the DeepSpeed, MoNTA achieves an 8x increase in AllToAll communication performance under 8-card tensor parallelism. Compared to the baseline, training a 2x70B model using 16 A800 cards, with an 8K sequence, results in a 13% overall latency performance improvement. Project Page: https://github.com/EnflameTechnology/DeepSpeed.

communication, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2411.00662

Genre: Research Report (0.50)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Grasping the Essentials: Tailoring Large Language Models for Zero-Shot Relation Extraction

Zhou, Sizhe, Meng, Yu, Jin, Bowen, Han, Jiawei

arXiv.org Artificial IntelligenceOct-24-2024

Relation extraction (RE) aims to identify semantic relationships between entities within text. Despite considerable advancements, existing models predominantly require extensive annotated training data, which is both costly and labor-intensive to collect. Moreover, these models often struggle to adapt to new or unseen relations. Few-shot learning, aiming to lessen annotation demands, typically provides incomplete and biased supervision for target relations, leading to degraded and unstable performance. To accurately and explicitly describe relation semantics while minimizing annotation demands, we explore the definition only zero-shot RE setting where only relation definitions expressed in natural language are used to train a RE model. We introduce REPaL, comprising three stages: (1) We leverage large language models (LLMs) to generate initial seed instances from relation definitions and an unlabeled corpus. (2) We fine-tune a bidirectional Small Language Model (SLM) with initial seeds to learn relations for the target domain. (3) We expand pattern coverage and mitigate bias from initial seeds by integrating feedback from the SLM's predictions on the unlabeled corpus and the synthesis history. To accomplish this, we leverage the multi-turn conversation ability of LLMs to generate new instances in follow-up dialogues, informed by both the feedback and synthesis history. Studies reveal that definition-oriented seed synthesis enhances pattern coverage whereas indiscriminately increasing seed quantity leads to performance saturation. Experiments on two datasets show REPaL significantly improved cost-effective zero-shot performance by large margins.

large language model, machine learning, relation, (17 more...)

arXiv.org Artificial Intelligence

2402.11142

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)

Genre: Personal (0.93)

Industry:

Media > Film (1.00)
Leisure & Entertainment > Sports > Olympic Games (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Establishing Knowledge Preference in Language Models

Zhou, Sizhe, Li, Sha, Meng, Yu, Jiao, Yizhu, Ji, Heng, Han, Jiawei

arXiv.org Artificial IntelligenceJul-17-2024

Language models are known to encode a great amount of factual knowledge through pretraining. However, such knowledge might be insufficient to cater to user requests, requiring the model to integrate external knowledge sources and adhere to user-provided specifications. When answering questions about ongoing events, the model should use recent news articles to update its response; when asked to provide recommendations, the model should prioritize user specifications over retrieved product reviews; when some facts are edited in the model, the updated facts should override all prior knowledge learned by the model even if they are conflicting. In all of the cases above, the model faces a decision between its own parametric knowledge, (retrieved) contextual knowledge, and user instruction knowledge. In this paper, we (1) unify such settings into the problem of knowledge preference and define a three-level preference hierarchy over these knowledge sources; (2) compile a collection of existing datasets IfQA, MQuAKE, and MRQA covering a combination of settings (with/without user specifications, with/without context documents) to systematically evaluate how well models obey the intended knowledge preference; and (3) propose a dataset synthesis method that composes diverse question-answer pairs with user assumptions and related context to directly fine-tune LMs for instilling the hierarchy of knowledge. We demonstrate that a 7B model, fine-tuned on only a few thousand examples automatically generated by our proposed method, effectively achieves superior performance (more than 18% improvement across all evaluation benchmarks) in adhering to the desired knowledge preference hierarchy.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2407.13048

Country:

Europe (1.00)
North America > United States > Illinois (0.14)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (0.93)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SimPO: Simple Preference Optimization with a Reference-Free Reward

Meng, Yu, Xia, Mengzhou, Chen, Danqi

arXiv.org Artificial IntelligenceJul-8-2024

Direct Preference Optimization (DPO) is a widely used offline preference optimization algorithm that reparameterizes reward functions in reinforcement learning from human feedback (RLHF) to enhance simplicity and training stability. In this work, we propose SimPO, a simpler yet more effective approach. The effectiveness of SimPO is attributed to a key design: using the average log probability of a sequence as the implicit reward. This reward formulation better aligns with model generation and eliminates the need for a reference model, making it more compute and memory efficient. Additionally, we introduce a target reward margin to the Bradley-Terry objective to encourage a larger margin between the winning and losing responses, further enhancing the algorithm's performance. We compare SimPO to DPO and its latest variants across various state-of-the-art training setups, including both base and instruction-tuned models like Mistral and Llama3. We evaluated on extensive instruction-following benchmarks, including AlpacaEval 2, MT-Bench, and the recent challenging Arena-Hard benchmark. Our results demonstrate that SimPO consistently and significantly outperforms existing approaches without substantially increasing response length. Specifically, SimPO outperforms DPO by up to 6.4 points on AlpacaEval 2 and by up to 7.5 points on Arena-Hard. Our top-performing model, built on Llama3-8B-Instruct, achieves a remarkable 53.7 length-controlled win rate on AlpacaEval 2 -- surpassing Claude 3 Opus on the leaderboard, and a 36.5 win rate on Arena-Hard -- making it the strongest 8B open-source model.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2405.14734

Country:

South America (0.46)
Europe > Italy (0.28)
North America > United States > Texas (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization

Tseng, Yu-Min, Huang, Yu-Chao, Hsiao, Teng-Yun, Chen, Wei-Lin, Huang, Chao-Wei, Meng, Yu, Chen, Yun-Nung

arXiv.org Artificial IntelligenceJun-26-2024

The concept of persona, originally adopted in dialogue literature, has re-surged as a promising framework for tailoring large language models (LLMs) to specific context (e.g., personalized search, LLM-as-a-judge). However, the growing research on leveraging persona in LLMs is relatively disorganized and lacks a systematic taxonomy. To close the gap, we present a comprehensive survey to categorize the current state of the field. We identify two lines of research, namely (1) LLM Role-Playing, where personas are assigned to LLMs, and (2) LLM Personalization, where LLMs take care of user personas. Additionally, we introduce existing methods for LLM personality evaluation. To the best of our knowledge, we present the first survey for role-playing and personalization in LLMs under the unified view of persona. We continuously maintain a paper collection to foster future endeavors: https://github.com/MiuLab/PersonaLLM-Survey

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.01171

Country:

North America > Canada (0.14)
Asia > Taiwan (0.14)
Oceania > Australia (0.14)
(5 more...)

Genre: Overview (1.00)

Industry:

Education (1.00)
Information Technology > Security & Privacy (0.93)
Leisure & Entertainment > Games > Computer Games (0.93)
Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

InstructRAG: Instructing Retrieval-Augmented Generation with Explicit Denoising

Wei, Zhepei, Chen, Wei-Lin, Meng, Yu

arXiv.org Artificial IntelligenceJun-19-2024

Retrieval-augmented generation (RAG) has shown promising potential to enhance the accuracy and factuality of language models (LMs). However, imperfect retrievers or noisy corpora can introduce misleading or even erroneous information to the retrieved contents, posing a significant challenge to the generation quality. Existing RAG methods typically address this challenge by directly predicting final answers despite potentially noisy inputs, resulting in an implicit denoising process that is difficult to interpret and verify. On the other hand, the acquisition of explicit denoising supervision is often costly, involving significant human efforts. In this work, we propose InstructRAG, where LMs explicitly learn the denoising process through self-synthesized rationales -- First, we instruct the LM to explain how the ground-truth answer is derived from retrieved documents. Then, these rationales can be used either as demonstrations for in-context learning of explicit denoising or as supervised fine-tuning data to train the model. Compared to standard RAG approaches, InstructRAG requires no additional supervision, allows for easier verification of the predicted answers, and effectively improves generation accuracy. Experiments show InstructRAG consistently outperforms existing RAG methods in both training-free and trainable scenarios, achieving a relative improvement of 8.3% over the best baseline method on average across five knowledge-intensive benchmarks. Extensive analysis indicates that InstructRAG scales well with increased numbers of retrieved documents and consistently exhibits robust denoising ability even in out-of-domain datasets, demonstrating strong generalizability.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2406.13629

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (1.00)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast

Shi, Chufan, Yang, Cheng, Zhu, Xinyu, Wang, Jiahao, Wu, Taiqiang, Li, Siheng, Cai, Deng, Yang, Yujiu, Meng, Yu

arXiv.org Artificial IntelligenceMay-23-2024

Mixture-of-Experts (MoE) has emerged as a prominent architecture for scaling model size while maintaining computational efficiency. In MoE, each token in the input sequence activates a different subset of experts determined by a routing mechanism. However, the unchosen experts in MoE models do not contribute to the output, potentially leading to underutilization of the model's capacity. In this work, we first conduct exploratory studies to demonstrate that increasing the number of activated experts does not necessarily improve and can even degrade the output quality. Then, we show that output distributions from an MoE model using different routing strategies substantially differ, indicating that different experts do not always act synergistically. Motivated by these findings, we propose Self-Contrast Mixture-of-Experts (SCMoE), a training-free strategy that utilizes unchosen experts in a self-contrast manner during inference. In SCMoE, the next-token probabilities are determined by contrasting the outputs from strong and weak activation using the same MoE model. Our method is conceptually simple and computationally lightweight, as it incurs minimal latency compared to greedy decoding. Experiments on several benchmarks (GSM8K, StrategyQA, MBPP and HumanEval) demonstrate that SCMoE can consistently enhance Mixtral 8x7B's reasoning capability across various domains. For example, it improves the accuracy on GSM8K from 61.79 to 66.94. Moreover, combining SCMoE with self-consistency yields additional gains, increasing major@20 accuracy from 75.59 to 78.31.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2405.14507

Country:

Asia (0.14)
Europe (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.66)

Add feedback

Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs

Jin, Bowen, Xie, Chulin, Zhang, Jiawei, Roy, Kashob Kumar, Zhang, Yu, Wang, Suhang, Meng, Yu, Han, Jiawei

arXiv.org Artificial IntelligenceApr-10-2024

Large language models (LLMs), while exhibiting exceptional performance, suffer from hallucinations, especially on knowledge-intensive tasks. Existing works propose to augment LLMs with individual text units retrieved from external knowledge corpora to alleviate the issue. However, in many domains, texts are interconnected (e.g., academic papers in a bibliographic graph are linked by citations and co-authorships) which form a (text-attributed) graph. The knowledge in such graphs is encoded not only in single texts/nodes but also in their associated connections. To facilitate the research of augmenting LLMs with graphs, we manually construct a Graph Reasoning Benchmark dataset called GRBench, containing 1,740 questions that can be answered with the knowledge from 10 domain graphs. Then, we propose a simple and effective framework called Graph Chain-of-thought (Graph-CoT) to augment LLMs with graphs by encouraging LLMs to reason on the graph iteratively. Each Graph-CoT iteration consists of three sub-steps: LLM reasoning, LLM-graph interaction, and graph execution. We conduct systematic experiments with three LLM backbones on GRBench, where Graph-CoT outperforms the baselines consistently. The code is available at https://github.com/PeterGriffinJin/Graph-CoT.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2404.07103

Country: North America > United States (0.93)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry: Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback