AITopics | Wang, Yidong

Collaborating Authors

Wang, Yidong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Language Anchor-Guided Method for Robust Noisy Domain Generalization

Dai, Zilin, Wang, Lehong, Lin, Fangzhou, Wang, Yidong, Li, Zhigang, Yamada, Kazunori D, Zhang, Ziming, Lu, Wang

arXiv.org Artificial IntelligenceMar-21-2025

Abstract--Real-world machine learning applications are often hindered by two critical challenges: distribution shift and label noise. Networks inherently tend to overfit to redundant, uninformative features present in the training distribution, which undermines their ability to generalize effectively to the target domain's distribution. The presence of noisy data further exacerbates this issue by inducing additional overfitting to noise, causing existing domain generalization methods to fail in effectively distinguishing invariant features from spurious ones. We also introduce a weighted loss function that dynamically adjusts the contribution of each sample based on its distance to the corresponding NLP anchor, thereby improving the model's resilience to noisy labels. Generalization (DG) has emerged as a pivotal algorithm in machine learning, aiming to develop models that can maintain high performance on previously unseen environments--or domains. T raditional methods often assume that training and test data share the same distribution, yet in real-world scenarios, there is frequently a substantial shift between these distributions. This phenomenon, widely referred to as domain shift, can cause severe performance degradation in tasks spanning computer vision, natural language processing, and medical image analysis [1]. As shown in Figure 1(a)(b), even within the same class label, the distribution of feature representations can vary considerably . This variation may stem from differences in image acquisition conditions--such as lighting variations, changes in pose, or complex background environments--and even from more subtle domain-specific factors like sensor noise or camera calibration differences. Such intra-class variability poses a significant challenge for developing accurate and adaptable models, which must learn to extract invariant features that capture the true semantic essence of the class while ignoring irrelevant variations. Lin, Z. Zhang is with Worcester Polytechnic Institute, Worcester, MA, 01890. L.Wang is with Carnegie Mellon University, Pittsburgh, P A, 15213. Y .Wang is with Peking University, Beijing, China, 100871. Z.Li, W.Lu is with T singhua University, Beijing, China, 100190. K.Y amada is with T ohoku University, Sendai, Japan, 980-8572.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.17211

Country:

Asia > China > Beijing > Beijing (0.44)
North America > United States > Massachusetts > Worcester County > Worcester (0.24)
Asia > Japan > Honshū > Tōhoku > Miyagi Prefecture > Sendai (0.24)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

StepMathAgent: A Step-Wise Agent for Evaluating Mathematical Processes through Tree-of-Error

Yang, Shu-Xun, Wang, Cunxiang, Wang, Yidong, Gu, Xiaotao, Huang, Minlie, Tang, Jie

arXiv.org Artificial IntelligenceMar-13-2025

Evaluating mathematical capabilities is critical for assessing the overall performance of large language models (LLMs). However, existing evaluation methods often focus solely on final answers, resulting in highly inaccurate and uninterpretable evaluation outcomes, as well as their failure to assess proof or open-ended problems. To address these issues, we propose a novel mathematical process evaluation agent based on Tree-of-Error, called StepMathAgent. This agent incorporates four internal core operations: logical step segmentation, step scoring, score aggregation and error tree generation, along with four external extension modules: difficulty calibration, simplicity evaluation, completeness validation and format assessment. Furthermore, we introduce StepMathBench, a benchmark comprising 1,000 step-divided process evaluation instances, derived from 200 high-quality math problems grouped by problem type, subject category and difficulty level. Experiments on StepMathBench show that our proposed StepMathAgent outperforms all state-of-the-art methods, demonstrating human-aligned evaluation preferences and broad applicability to various scenarios. Our data and code are available at https://github.com/SHU-XUN/StepMathAgent.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2503.10105

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Masked Autoencoders Are Effective Tokenizers for Diffusion Models

Chen, Hao, Han, Yujin, Chen, Fangyi, Li, Xiang, Wang, Yidong, Wang, Jindong, Wang, Ze, Liu, Zicheng, Zou, Difan, Raj, Bhiksha

arXiv.org Artificial IntelligenceFeb-5-2025

Recent advances in latent diffusion models have demonstrated their effectiveness for high-resolution image synthesis. However, the properties of the latent space from tokenizer for better learning and generation of diffusion models remain under-explored. Theoretically and empirically, we find that improved generation quality is closely tied to the latent distributions with better structure, such as the ones with fewer Gaussian Mixture modes and more discriminative features. Motivated by these insights, we propose MAETok, an autoencoder (AE) leveraging mask modeling to learn semantically rich latent space while maintaining reconstruction fidelity. Extensive experiments validate our analysis, demonstrating that the variational form of autoencoders is not necessary, and a discriminative latent space from AE alone enables state-of-the-art performance on ImageNet generation using only 128 tokens. MAETok achieves significant practical improvements, enabling a gFID of 1.69 with 76x faster training and 31x higher inference throughput for 512x512 generation. Our findings show that the structure of the latent space, rather than variational constraints, is crucial for effective diffusion models. Code and trained models are released.

artificial intelligence, latent space, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2502.03444

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Outcome-Refining Process Supervision for Code Generation

Yu, Zhuohao, Gu, Weizheng, Wang, Yidong, Zeng, Zhengran, Wang, Jindong, Ye, Wei, Zhang, Shikun

arXiv.org Artificial IntelligenceDec-19-2024

Large Language Models have demonstrated remarkable capabilities in code generation, yet they often struggle with complex programming tasks that require deep algorithmic reasoning. While process supervision through learned reward models shows promise in guiding reasoning steps, it requires expensive training data and suffers from unreliable evaluation. We propose Outcome-Refining Process Supervision, a novel paradigm that treats outcome refinement itself as the process to be supervised. Our framework leverages concrete execution signals to ground the supervision of reasoning steps, while using tree-structured exploration to maintain multiple solution trajectories simultaneously. Experiments demonstrate that our approach enables even smaller models to achieve high success accuracy and performance metrics on competitive programming tasks, creates more reliable verification than traditional reward models without requiring training PRMs. Our approach achieves significant improvements across 5 models and 3 datasets: an average of 26.9% increase in correctness and 42.2% in efficiency. The results suggest that providing structured reasoning space with concrete verification signals is crucial for solving complex programming tasks. We open-source all our code and data at: https://github.com/zhuohaoyu/ORPS

large language model, machine learning, preprint arxiv, (20 more...)

arXiv.org Artificial Intelligence

2412.15118

Country: Asia (0.67)

Genre: Research Report > New Finding (1.00)

Industry: Energy > Oil & Gas > Downstream (0.61)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Learning from "Silly" Questions Improves Large Language Models, But Only Slightly

Zhu, Tingyuan, Liu, Shudong, Wang, Yidong, Wong, Derek F., Yu, Han, Shinozaki, Takahiro, Wang, Jindong

arXiv.org Artificial IntelligenceNov-21-2024

Constructing high-quality Supervised Fine-Tuning (SFT) datasets is critical for the training of large language models (LLMs). Recent studies have shown that using data from a specific source, Ruozhiba, a Chinese website where users ask "silly" questions to better understand certain topics, can lead to better fine-tuning performance. This paper aims to explore some hidden factors: the potential interpretations of its success and a large-scale evaluation of the performance. First, we leverage GPT-4 to analyze the successful cases of Ruozhiba questions from the perspective of education, psychology, and cognitive science, deriving a set of explanatory rules. Then, we construct fine-tuning datasets by applying these rules to the MMLU training set. Surprisingly, our results indicate that rules can significantly improve model performance in certain tasks, while potentially diminishing performance on others. For example, SFT data generated following the "Counterintuitive Thinking" rule can achieve approximately a 5% improvement on the "Global Facts" task, whereas the "Blurring the Conceptual Boundaries" rule leads to a performance drop of 6.14% on the "Econometrics" task. In addition, for specific tasks, different rules tend to have a consistent impact on model performance. This suggests that the differences between the extracted rules are not as significant, and the effectiveness of the rules is relatively consistent across tasks. Our research highlights the importance of considering task diversity and rule applicability when constructing SFT datasets to achieve more comprehensive performance improvements.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.14121

Country:

North America > United States (0.28)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

On the Diversity of Synthetic Data and its Impact on Training Large Language Models

Chen, Hao, Waheed, Abdul, Li, Xiang, Wang, Yidong, Wang, Jindong, Raj, Bhiksha, Abdin, Marah I.

arXiv.org Artificial IntelligenceOct-22-2024

The rise of Large Language Models (LLMs) has accentuated the need for diverse, high-quality pre-training data. Synthetic data emerges as a viable solution to the challenges of data scarcity and inaccessibility. While previous literature has focused predominantly on the quality and quantity of real data, our work enables the measurement of diversity in synthetic data and explores its impact on LLM performance. We study the downstream effects of synthetic data diversity during both the pre-training and fine-tuning stages by introducing a new diversity metric, \textit{LLM cluster-agent}, designed to evaluate the diversity of synthetic datasets. Through a series of controlled experiments with models of 350M and 1.4B parameters, we demonstrate that the proposed cluster-based LLM scoring of diversity correlates positively with both pre-training and supervised fine-tuning performance. Our findings also reveal that synthetic data diversity in pre-training affects supervised fine-tuning more significantly than pre-training itself, even for smaller models. We hope this study advances our understanding of the optimal use of synthetic data in LLM training and opens new avenues for efficient data generation processes.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.15226

Country:

North America > United States (0.67)
Europe (0.45)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(10 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application

Yang, Chuanpeng, Lu, Wang, Zhu, Yao, Wang, Yidong, Chen, Qian, Gao, Chenlong, Yan, Bingjie, Chen, Yiqiang

arXiv.org Artificial IntelligenceJul-1-2024

Large Language Models (LLMs) have showcased exceptional capabilities in various domains, attracting significant interest from both academia and industry. Despite their impressive performance, the substantial size and computational demands of LLMs pose considerable challenges for practical deployment, particularly in environments with limited resources. The endeavor to compress language models while maintaining their accuracy has become a focal point of research. Among the various methods, knowledge distillation has emerged as an effective technique to enhance inference speed without greatly compromising performance. This paper presents a thorough survey from three aspects: method, evaluation, and application, exploring knowledge distillation techniques tailored specifically for LLMs. Specifically, we divide the methods into white-box KD and black-box KD to better illustrate their differences. Furthermore, we also explored the evaluation tasks and distillation effects between different distillation methods, and proposed directions for future research. Through in-depth understanding of the latest advancements and practical applications, this survey provides valuable resources for researchers, paving the way for sustained progress in this field.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2407.01885

Country:

Asia > China (0.14)
Oceania > Australia (0.14)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Health & Medicine (1.00)
Law (0.68)
Education > Curriculum > Subject-Specific Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Apollo: A Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People

Wang, Xidong, Chen, Nuo, Chen, Junyin, Hu, Yan, Wang, Yidong, Wu, Xiangbo, Gao, Anningzhe, Wan, Xiang, Li, Haizhou, Wang, Benyou

arXiv.org Artificial IntelligenceJun-28-2024

Despite the vast repository of global medical knowledge predominantly being in English, local languages are crucial for delivering tailored healthcare services, particularly in areas with limited medical resources. To extend the reach of medical AI advancements to a broader population, we aim to develop medical LLMs across the six most widely spoken languages, encompassing a global population of 6.1 billion. This effort culminates in the creation of the ApolloCorpora multilingual medical dataset and the XMedBench benchmark. In the multilingual medical benchmark, the released Apollo models, at various relatively-small sizes (i.e., 0.5B, 1.8B, 2B, 6B, and 7B), achieve the best performance among models of equivalent size. Especially, Apollo-7B is the state-of-the-art multilingual medical LLMs up to 70B. Additionally, these lite models could be used to improve the multi-lingual medical capabilities of larger models without fine-tuning in a proxy-tuning fashion.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2403.0364

Country:

North America > United States (0.28)
Asia > China (0.28)
Europe (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Consumer Health (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Enhancing In-Context Learning via Implicit Demonstration Augmentation

Zhou, Xiaoling, Ye, Wei, Wang, Yidong, Jiang, Chaoya, Lee, Zhemg, Xie, Rui, Zhang, Shikun

arXiv.org Artificial IntelligenceJun-27-2024

The emergence of in-context learning (ICL) enables large pre-trained language models (PLMs) to make predictions for unseen inputs without updating parameters. Despite its potential, ICL's effectiveness heavily relies on the quality, quantity, and permutation of demonstrations, commonly leading to suboptimal and unstable performance. In this paper, we tackle this challenge for the first time from the perspective of demonstration augmentation. Specifically, we start with enriching representations of demonstrations by leveraging their deep feature distribution. We then theoretically reveal that when the number of augmented copies approaches infinity, the augmentation is approximately equal to a novel logit calibration mechanism integrated with specific statistical properties. This insight results in a simple yet highly efficient method that significantly improves the average and worst-case accuracy across diverse PLMs and tasks. Moreover, our method effectively reduces performance variance among varying demonstrations, permutations, and templates, and displays the capability to address imbalanced class distributions.

demonstration, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2407.001

Country:

Africa (0.67)
North America > United States > Washington > King County > Seattle (0.14)
Asia > Middle East > UAE (0.14)
(2 more...)

Genre: Research Report > New Finding (0.93)

Industry: Leisure & Entertainment (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

Add feedback

AutoSurvey: Large Language Models Can Automatically Write Surveys

Wang, Yidong, Guo, Qi, Yao, Wenjin, Zhang, Hongbo, Zhang, Xin, Wu, Zhen, Zhang, Meishan, Dai, Xinyu, Zhang, Min, Wen, Qingsong, Ye, Wei, Zhang, Shikun, Zhang, Yue

arXiv.org Artificial IntelligenceJun-17-2024

This paper introduces AutoSurvey, a speedy and well-organized methodology for automating the creation of comprehensive literature surveys in rapidly evolving fields like artificial intelligence. Traditional survey paper creation faces challenges due to the vast volume and complexity of information, prompting the need for efficient survey methods. While large language models (LLMs) offer promise in automating this process, challenges such as context window limitations, parametric knowledge constraints, and the lack of evaluation benchmarks remain. AutoSurvey addresses these challenges through a systematic approach that involves initial retrieval and outline generation, subsection drafting by specialized LLMs, integration and refinement, and rigorous evaluation and iteration. Our contributions include a comprehensive solution to the survey problem, a reliable evaluation method, and experimental validation demonstrating AutoSurvey's effectiveness.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2406.10252

Country:

Asia > China (0.46)
North America > United States > Louisiana (0.14)

Genre:

Overview (1.00)
Research Report > New Finding (0.67)

Industry:

Health & Medicine (0.68)
Education (0.46)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback