AITopics | Yu, Zichun

Collaborating Authors

Yu, Zichun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Data-Efficient Pretraining with Group-Level Data Influence Modeling

Yu, Zichun, Peng, Fei, Lei, Jie, Overwijk, Arnold, Yih, Wen-tau, Xiong, Chenyan

arXiv.org Artificial IntelligenceFeb-20-2025

Data-efficient pretraining has shown tremendous potential to elevate scaling laws. This paper argues that effective pretraining data should be curated at the group level, treating a set of data points as a whole rather than as independent contributors. To achieve that, we propose Group-Level Data Influence Modeling (Group-MATES), a novel data-efficient pretraining method that captures and optimizes group-level data utility. Specifically, Group-MATES collects oracle group-level influences by locally probing the pretraining model with data sets. It then fine-tunes a relational data influence model to approximate oracles as relationship-weighted aggregations of individual influences. The fine-tuned model selects the data subset by maximizing its group-level influence prediction, with influence-aware clustering to enable efficient inference. Experiments on the DCLM benchmark demonstrate that Group-MATES achieves a 10% relative core score improvement on 22 downstream tasks over DCLM-Baseline and 5% over individual-influence-based methods, establishing a new state-of-the-art. Further analyses highlight the effectiveness of relational data influence models in capturing intricate interactions between data points.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2502.14709

Country: Asia > Middle East (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search

Shi, Wentao, Yu, Zichun, Feng, Fuli, He, Xiangnan, Xiong, Chenyan

arXiv.org Artificial IntelligenceFeb-2-2025

Monte Carlo Tree Search (MCTS) based methods provide promising approaches for generating synthetic data to enhance the self-training of Large Language Model (LLM) based multi-agent systems (MAS). These methods leverage Q-values to estimate individual agent contributions. However, relying solely on Q-values to identify informative data may misalign with the data synthesis objective, as the focus should be on selecting data that best enhances model training. To address this discrepancy, we propose Data Influence-oriented Tree Search (DITS), a novel framework that incorporates influence scores to guide both tree search and data selection. By leveraging influence scores, we effectively identify the most impactful data for system improvement, thereby enhancing model performance. Furthermore, we derive influence score estimation methods tailored for non-differentiable metrics, significantly reducing computational overhead by utilizing inference computations. Extensive experiments on eight multi-agent datasets demonstrate the robustness and effectiveness of the proposed methods. Notably, our findings reveal that allocating more inference resources to estimate influence scores, rather than Q-values, during data synthesis can more effectively and efficiently enhance model training.

artificial intelligence, influence score, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2502.00955

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning

Li, Xiaochuan, Yu, Zichun, Xiong, Chenyan

arXiv.org Artificial IntelligenceOct-18-2024

Synthetic data has been widely used to train large language models, but their generative nature inevitably introduces noisy, non-informative, and misleading learning signals. In this paper, we propose Montessori-Instruct, a novel data synthesis framework that tailors the data synthesis ability of the teacher language model toward the student language model's learning process. Specifically, we utilize local data influence of synthetic training data points on students to characterize students' learning preferences. Then, we train the teacher model with Direct Preference Optimization (DPO) to generate synthetic data tailored toward student learning preferences. Experiments with Llama3-8B-Instruct (teacher) and Llama3-8B (student) on Alpaca Eval and MT-Bench demonstrate that Montessori-Instruct significantly outperforms standard synthesis methods by 18.35\% and 46.24\% relatively. Our method also beats data synthesized by a stronger teacher model, GPT-4o. Further analysis confirms the benefits of teacher's learning to generate more influential training data in the student's improved learning, the advantages of local data influence in accurately measuring student preferences, and the robustness of Montessori-Instruct across different student models. Our code and data are open-sourced at https://github.com/cxcscmu/Montessori-Instruct.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.14208

Genre: Research Report > New Finding (0.46)

Industry: Education > Educational Setting > Preschool (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

An In-depth Look at Gemini's Language Abilities

Akter, Syeda Nahida, Yu, Zichun, Muhamed, Aashiq, Ou, Tianyue, Bäuerle, Alex, Cabrera, Ángel Alexander, Dholakia, Krish, Xiong, Chenyan, Neubig, Graham

arXiv.org Artificial IntelligenceDec-24-2023

The recently released Google Gemini class of models are the first to comprehensively report results that rival the OpenAI GPT series across a wide variety of tasks. In this paper, we do an in-depth exploration of Gemini's language abilities, making two contributions. First, we provide a third-party, objective comparison of the abilities of the OpenAI GPT and Google Gemini models with reproducible code and fully transparent results. Second, we take a closer look at the results, identifying areas where one of the two model classes excels. We perform this analysis over 10 datasets testing a variety of language abilities, including reasoning, answering knowledge-based questions, solving math problems, translating between languages, generating code, and acting as instruction-following agents. From this analysis, we find that Gemini Pro achieves accuracy that is close but slightly inferior to the corresponding GPT 3.5 Turbo on all English-language tasks that we benchmarked, but find that Gemini Pro excels in translation into other languages for the languages that it supports. We further provide explanations for some of the under-performing tasks, including failures in mathematical reasoning with many digits, sensitivity to multiple-choice answer ordering, and others. We also identify areas where Gemini Pro demonstrates comparably high performance, such as handling longer and more complex reasoning chains.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2312.11444

Country: North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry: Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.55)

Add feedback

Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In

Yu, Zichun, Xiong, Chenyan, Yu, Shi, Liu, Zhiyuan

arXiv.org Artificial IntelligenceMay-26-2023

Retrieval augmentation can aid language models (LMs) in knowledge-intensive tasks by supplying them with external information. Prior works on retrieval augmentation usually jointly fine-tune the retriever and the LM, making them closely coupled. In this paper, we explore the scheme of generic retrieval plug-in: the retriever is to assist target LMs that may not be known beforehand or are unable to be fine-tuned together. To retrieve useful documents for unseen target LMs, we propose augmentation-adapted retriever (AAR), which learns LM's preferences obtained from a known source LM. Experiments on the MMLU and PopQA datasets demonstrate that our AAR trained with a small source LM is able to significantly improve the zero-shot generalization of larger target LMs ranging from 250M Flan-T5 to 175B InstructGPT. Further analysis indicates that the preferences of different LMs overlap, enabling AAR trained with a single source LM to serve as a generic plug-in for various target LMs. Our code is open-sourced at https://github.com/OpenMatch/Augmentation-Adapted-Retriever.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.17331

Country: Asia > China (0.14)

Genre: Research Report (0.50)

Industry:

Transportation (0.47)
Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)

Add feedback