AITopics | Chen, Wang

Collaborating Authors

Chen, Wang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Full-Step-DPO: Self-Supervised Preference Optimization with Step-wise Rewards for Mathematical Reasoning

Xu, Huimin, Mao, Xin, Li, Feng-Lin, Wu, Xiaobao, Chen, Wang, Zhang, Wei, Luu, Anh Tuan

arXiv.org Artificial IntelligenceFeb-20-2025

Direct Preference Optimization (DPO) often struggles with long-chain mathematical reasoning. Existing approaches, such as Step-DPO, typically improve this by focusing on the first erroneous step in the reasoning chain. However, they overlook all other steps and rely heavily on humans or GPT-4 to identify erroneous steps. To address these issues, we propose Full-Step-DPO, a novel DPO framework tailored for mathematical reasoning. Instead of optimizing only the first erroneous step, it leverages step-wise rewards from the entire reasoning chain. This is achieved by training a self-supervised process reward model, which automatically scores each step, providing rewards while avoiding reliance on external signals. Furthermore, we introduce a novel step-wise DPO loss, which dynamically updates gradients based on these step-wise rewards. This endows stronger reasoning capabilities to language models. Extensive evaluations on both in-domain and out-of-domain mathematical reasoning benchmarks across various base language models, demonstrate that Full-Step-DPO achieves superior performance compared to state-of-the-art baselines.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2502.14356

Country:

Asia > Singapore (0.14)
North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multi-Task Model Merging via Adaptive Weight Disentanglement

Xiong, Feng, Cheng, Runxi, Chen, Wang, Zhang, Zhanqiu, Guo, Yiwen, Yuan, Chun, Xu, Ruifeng

arXiv.org Artificial IntelligenceJan-8-2025

Model merging has recently gained attention as an economical and scalable approach to incorporate task-specific weights from various tasks into a unified multi-task model. For example, in Task Arithmetic (TA), adding the fine-tuned weights of different tasks can enhance the model's performance on those tasks, while subtracting them leads to task forgetting. Although TA is highly effective, interference among task still hampers the performance of the merged model. Existing methods for handling conflicts between task generally rely on empirical selection, resulting in suboptimal performance. In this paper, we introduce an Adaptive Weight Disentanglement method. We begin by theoretically proving that task vectors employed in model merging should be orthogonal to minimize interference among tasks. Guided by this insight, we initialize redundant vectors such that, when subtracted from the original task vectors, the resulting vectors exhibit increased orthogonality. Additionally, we impose an norm constraint on the redundant vectors to preserve the performance of the task-specific models. Experimental results demonstrate the effectiveness of our proposed technique: it successfully extracts redundant vectors, and after their subtraction, the task vectors not only retain robust performance but also achieve superior fusion outcomes. Our code is available at \href{https://github.com/FarisXiong/AWD.git}{https://github.com/FarisXiong/AWD.git}.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.18729

Country: Asia (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss

Mao, Xin, Li, Feng-Lin, Xu, Huimin, Zhang, Wei, Chen, Wang, Luu, Anh Tuan

arXiv.org Artificial IntelligenceOct-25-2024

Direct Preference Optimization (DPO) has emerged as a more computationally efficient alternative to Reinforcement Learning from Human Feedback (RLHF) with Proximal Policy Optimization (PPO), eliminating the need for reward models and online sampling. Despite these benefits, DPO and its variants remain sensitive to hyper-parameters and prone to instability, particularly on mathematical datasets. We argue that these issues arise from the unidirectional likelihood-derivative negative feedback inherent in the log-likelihood loss function. To address this, we propose a novel LLM alignment loss that establishes a stable Bidirectional Negative Feedback (BNF) during optimization. Our proposed BNF loss eliminates the need for pairwise contrastive losses and does not require any extra tunable hyper-parameters or pairwise preference data, streamlining the alignment pipeline to be as simple as supervised fine-tuning. We conduct extensive experiments across two challenging QA benchmarks and four reasoning benchmarks. The experimental results show that BNF achieves comparable performance to the best methods on QA benchmarks, while its performance decrease on the four reasoning benchmarks is significantly lower compared to the best methods, thus striking a better balance between value alignment and reasoning ability. In addition, we further validate the performance of BNF on non-pairwise datasets, and conduct in-depth analysis of log-likelihood and logit shifts across different preference optimization methods.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2410.04834

Genre: Research Report (1.00)

Industry: Energy (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MLOps Spanning Whole Machine Learning Life Cycle: A Survey

Zhengxin, Fang, Yi, Yuan, Jingyu, Zhang, Yue, Liu, Yuechen, Mu, Qinghua, Lu, Xiwei, Xu, Jeff, Wang, Chen, Wang, Shuai, Zhang, Shiping, Chen

arXiv.org Artificial IntelligenceApr-13-2023

Google AlphaGos win has significantly motivated and sped up machine learning (ML) research and development, which led to tremendous ML technical advances and wider adoptions in various domains (e.g., Finance, Health, Defense, and Education). These advances have resulted in numerous new concepts and technologies, which are too many for people to catch up to and even make them confused, especially for newcomers to the ML area. This paper is aimed to present a clear picture of the state-of-the-art of the existing ML technologies with a comprehensive survey. We lay out this survey by viewing ML as a MLOps (ML Operations) process, where the key concepts and activities are collected and elaborated with representative works and surveys. We hope that this paper can serve as a quick reference manual (a survey of surveys) for newcomers (e.g., researchers, practitioners) of ML to get an overview of the MLOps process, as well as a good understanding of the key technologies used in each step of the ML process, and know where to find more details.

evolutionary algorithm, machine learning life cycle, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2304.07296

Country:

North America > United States (1.00)
Oceania (0.93)

Genre:

Workflow (1.00)
Research Report (1.00)
Overview (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(7 more...)

Add feedback

Difficulty-controllable Question Generation for Reading Comprehension

Gao, Yifan, Bing, Lidong, Chen, Wang, Wang, Jianan, King, Irwin, Lyu, Michael R.

arXiv.org Artificial IntelligenceSep-8-2018

We investigate the difficulty levels of questions, and propose a new setting called Difficulty-controllable Question Generation (DQG). Taking as input a reading comprehension paragraph and some text fragments (i.e. answers) in the paragraph that we want to ask questions about, a DQG method needs to generate questions each of which has a given text fragment as its answer, and meanwhile the generation is under the control of specified difficulty labels---the output questions should satisfy the specified difficulty as much as possible. To solve this task, we propose an end-to-end framework to generate questions of designated difficulty levels. Specifically, we explore a few intuitions: (i) In the input sentences, the nearer a word is to the answer fragment, the more likely it is used in the question; (ii) The easier a question is, the nearer its words are to the answer fragment in the sentence; (iii) Performing difficulty control could be regarded as a problem of sentence generation towards a specified attribute or style, namely difficulty level. For evaluation, we prepared the first dataset of reading comprehension questions with difficulty labels. The results show that our framework not only generates questions of better quality under the metrics like BLEU, but also has the capability to generate questions complying with the specified difficulty labels.

deep learning, neural network, question generation, (23 more...)

arXiv.org Artificial Intelligence

1807.03586

Country:

North America > Canada (0.14)
Asia > China (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Education > Assessment & Standards > Student Performance (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback