AITopics | Chen, Zhipeng

Collaborating Authors

Chen, Zhipeng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models

Sun, Haoxiang, Min, Yingqian, Chen, Zhipeng, Zhao, Wayne Xin, Liu, Zheng, Wang, Zhongyuan, Fang, Lei, Wen, Ji-Rong

arXiv.org Artificial IntelligenceMar-27-2025

In recent years, the rapid development of large reasoning models has resulted in the saturation of existing benchmarks for evaluating mathematical reasoning, highlighting the urgent need for more challenging and rigorous evaluation frameworks. To address this gap, we introduce OlymMATH, a novel Olympiad-level mathematical benchmark, designed to rigorously test the complex reasoning capabilities of LLMs. OlymMATH features 200 meticulously curated problems, each manually verified and available in parallel English and Chinese versions. The problems are systematically organized into two distinct difficulty tiers: (1) AIME-level problems (easy) that establish a baseline for mathematical reasoning assessment, and (2) significantly more challenging problems (hard) designed to push the boundaries of current state-of-the-art models. In our benchmark, these problems span four core mathematical fields, each including a verifiable numerical solution to enable objective, rule-based evaluation. Empirical results underscore the significant challenge presented by OlymMATH, with state-of-the-art models including DeepSeek-R1 and OpenAI's o3-mini demonstrating notably limited accuracy on the hard subset. Furthermore, the benchmark facilitates comprehensive bilingual assessment of mathematical reasoning abilities-a critical dimension that remains largely unaddressed in mainstream mathematical reasoning benchmarks. We release the OlymMATH benchmark at the STILL project: https://github.com/RUCAIBox/Slow_Thinking_with_LLMs.

benchmark, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.2138

Country: Asia (0.28)

Genre:

Research Report > Promising Solution (0.54)
Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

Song, Huatong, Jiang, Jinhao, Min, Yingqian, Chen, Jie, Chen, Zhipeng, Zhao, Wayne Xin, Fang, Lei, Wen, Ji-Rong

arXiv.org Artificial IntelligenceMar-18-2025

Existing Large Reasoning Models (LRMs) have shown the potential of reinforcement learning (RL) to enhance the complex reasoning capabilities of Large Language Models~(LLMs). While they achieve remarkable performance on challenging tasks such as mathematics and coding, they often rely on their internal knowledge to solve problems, which can be inadequate for time-sensitive or knowledge-intensive questions, leading to inaccuracies and hallucinations. To address this, we propose \textbf{R1-Searcher}, a novel two-stage outcome-based RL approach designed to enhance the search capabilities of LLMs. This method allows LLMs to autonomously invoke external search systems to access additional knowledge during the reasoning process. Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start. % effectively generalizing to out-of-domain datasets and supporting both Base and Instruct models. Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.

dataset, retrieval, zhang, (17 more...)

arXiv.org Artificial Intelligence

2503.05592

Country:

North America > United States (0.69)
Asia (0.46)
Europe > Austria > Vienna (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

An Empirical Study on Eliciting and Improving R1-like Reasoning Models

Chen, Zhipeng, Min, Yingqian, Zhang, Beichen, Chen, Jie, Jiang, Jinhao, Cheng, Daixuan, Zhao, Wayne Xin, Liu, Zheng, Miao, Xu, Lu, Yang, Fang, Lei, Wang, Zhongyuan, Wen, Ji-Rong

arXiv.org Artificial IntelligenceMar-6-2025

In this report, we present the third technical report on the development of slow-thinking models as part of the STILL project. As the technical pathway becomes clearer, scaling RL training has become a central technique for implementing such reasoning models. We systematically experiment with and document the effects of various factors influencing RL training, conducting experiments on both base models and fine-tuned models. Specifically, we demonstrate that our RL training approach consistently improves the Qwen2.5-32B base models, enhancing both response length and test accuracy. Furthermore, we show that even when a model like DeepSeek-R1-Distill-Qwen-1.5B has already achieved a high performance level, it can be further refined through RL training, reaching an accuracy of 39.33% on AIME 2024. Beyond RL training, we also explore the use of tool manipulation, finding that it significantly boosts the reasoning performance of large reasoning models. This approach achieves a remarkable accuracy of 86.67% with greedy search on AIME 2024, underscoring its effectiveness in enhancing model capabilities. We release our resources at the STILL project website: https://github.com/RUCAIBox/Slow_Thinking_with_LLMs.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2503.04548

Genre: Research Report > New Finding (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Enhancing LLM Reasoning with Reward-guided Tree Search

Jiang, Jinhao, Chen, Zhipeng, Min, Yingqian, Chen, Jie, Cheng, Xiaoxue, Wang, Jiapeng, Tang, Yiru, Sun, Haoxiang, Deng, Jia, Zhao, Wayne Xin, Liu, Zheng, Yan, Dong, Xie, Jian, Wang, Zhongyuan, Wen, Ji-Rong

arXiv.org Artificial IntelligenceDec-30-2024

Recently, test-time scaling has garnered significant attention from the research community, largely due to the substantial advancements of the o1 model released by OpenAI. By allocating more computational resources during the inference phase, large language models (LLMs) can extensively explore the solution space by generating more thought tokens or diverse solutions, thereby producing more accurate responses. However, develop an o1-like reasoning approach is challenging, and researchers have been making various attempts to advance this open area of research. In this paper, we present a preliminary exploration into enhancing the reasoning abilities of LLMs through reward-guided tree search algorithms. This framework is implemented by integrating the policy model, reward model, and search algorithm. It is primarily constructed around a tree search algorithm, where the policy model navigates a dynamically expanding tree guided by a specially trained reward model. The implemented framework is denoted as STILL-1 (Slow Thinking with LLMs), marking the first model developed by our project, "Slow Thinking with LLMs". We thoroughly explore various design considerations necessary for implementing this framework and provide a detailed report of the technical aspects. To assess the effectiveness of our approach, we focus on mathematical reasoning tasks and conduct extensive evaluations on four challenging datasets, significantly enhancing the reasoning abilities of LLMs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2411.11694

Country: Europe > Austria > Vienna (0.14)

Genre:

Workflow (1.00)
Research Report (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Imitate, Explore, and Self-Improve: A Reproduction Report on Slow-thinking Reasoning Systems

Min, Yingqian, Chen, Zhipeng, Jiang, Jinhao, Chen, Jie, Deng, Jia, Hu, Yiwen, Tang, Yiru, Wang, Jiapeng, Cheng, Xiaoxue, Song, Huatong, Zhao, Wayne Xin, Liu, Zheng, Wang, Zhongyuan, Wen, Ji-Rong

arXiv.org Artificial IntelligenceDec-22-2024

Recently, slow-thinking reasoning systems, such as o1, have demonstrated remarkable capabilities in solving complex reasoning tasks. These systems typically engage in an extended thinking process before responding to a query, allowing them to generate more thorough, accurate, and well-reasoned solutions. These systems are primarily developed and maintained by industry, with their core techniques not publicly disclosed. In response, an increasing number of studies from the research community aim to explore the technical foundations underlying these powerful reasoning systems. Building on these prior efforts, this paper presents a reproduction report on implementing o1-like reasoning systems. We introduce an ``imitate, explore, and self-improve'' framework, denoted as \textbf{STILL-2}, as our primary technical approach to train the reasoning model. In the initial phase, we use distilled long-form thought data to fine-tune the reasoning model, enabling it to invoke a slow-thinking mode. The model is then encouraged to explore challenging problems by generating multiple rollouts, which can result in increasingly more high-quality trajectories that lead to correct answers. Furthermore, the model undergoes self-improvement by iteratively refining its training dataset. To verify the effectiveness of this approach, we conduct extensive experiments on three challenging benchmarks. The experimental results demonstrate that our approach achieves competitive performance compared to industry-level reasoning systems on these benchmarks.

artificial intelligence, reasoning model, reasoning system, (17 more...)

arXiv.org Artificial Intelligence

2412.09413

Genre: Research Report > New Finding (0.66)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Extracting and Transferring Abilities For Building Multi-lingual Ability-enhanced Large Language Models

Chen, Zhipeng, Song, Liang, Zhou, Kun, Zhao, Wayne Xin, Wang, Bingning, Chen, Weipeng, Wen, Ji-Rong

arXiv.org Artificial IntelligenceOct-10-2024

Multi-lingual ability transfer has become increasingly important for the broad application of large language models (LLMs). Existing work highly relies on training with the multi-lingual ability-related data, which may be not available for low-resource languages. To solve it, we propose a Multi-lingual Ability Extraction and Transfer approach, named as MAET. Our key idea is to decompose and extract language-agnostic ability-related weights from LLMs, and transfer them across different languages by simple addition and subtraction operations without training. Specially, our MAET consists of the extraction and transfer stages. In the extraction stage, we firstly locate key neurons that are highly related to specific abilities, and then employ them to extract the transferable ability-specific weights. In the transfer stage, we further select the ability-related parameter tensors, and design the merging strategy based on the linguistic and ability specific weights, to build the multi-lingual ability-enhanced LLM. To demonstrate the effectiveness of our proposed approach, we conduct extensive experiments on mathematical and scientific tasks in both high-resource lingual and low-resource lingual scenarios. Experiment results have shown that MAET can effectively and efficiently extract and transfer the advanced abilities, and outperform training-based baseline methods. Our code and data are available at \url{https://github.com/RUCAIBox/MAET}.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.07825

Country:

North America > United States (0.29)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.64)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Low-Redundant Optimization for Large Language Model Alignment

Chen, Zhipeng, Zhou, Kun, Zhao, Wayne Xin, Wang, Jingyuan, Wen, Ji-Rong

arXiv.org Artificial IntelligenceJun-18-2024

Large language models (LLMs) are still struggling in aligning with human preference in complex tasks and scenarios. They are prone to overfit into the unexpected patterns or superficial styles in the training data. We conduct an empirical study that only selects the top-10\% most updated parameters in LLMs for alignment training, and see improvements in the convergence process and final performance. It indicates the existence of redundant neurons in LLMs for alignment training. To reduce its influence, we propose a low-redundant alignment method named \textbf{ALLO}, focusing on optimizing the most related neurons with the most useful supervised signals. Concretely, we first identify the neurons that are related to the human preference data by a gradient-based strategy, then identify the alignment-related key tokens by reward models for computing loss. Besides, we also decompose the alignment process into the forgetting and learning stages, where we first forget the tokens with unaligned knowledge and then learn aligned knowledge, by updating different ratios of neurons, respectively. Experimental results on 10 datasets have shown the effectiveness of ALLO. Our code and data are available at \url{https://github.com/RUCAIBox/ALLO}.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2406.12606

Country:

North America > United States > California (0.28)
North America > United States > New York > New York County > New York City (0.14)

Genre:

Research Report (0.64)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Dual-Pipeline with Low-Rank Adaptation for New Language Integration in Multilingual ASR

Khassanov, Yerbolat, Chen, Zhipeng, Chen, Tianfeng, Chong, Tze Yuang, Li, Wei, Zhang, Jun, Lu, Lu, Wang, Yuxuan

arXiv.org Artificial IntelligenceJun-11-2024

This paper addresses challenges in integrating new languages into a pre-trained multilingual automatic speech recognition (mASR) system, particularly in scenarios where training data for existing languages is limited or unavailable. The proposed method employs a dual-pipeline with low-rank adaptation (LoRA). It maintains two data flow pipelines--one for existing languages and another for new languages. The primary pipeline follows the standard flow through the pre-trained parameters of mASR, while the secondary pipeline additionally utilizes language-specific parameters represented by LoRA and a separate output decoder module. Importantly, the proposed approach minimizes the performance degradation of existing languages and enables a language-agnostic operation mode, facilitated by a decoder selection strategy. We validate the effectiveness of the proposed method by extending the pre-trained Whisper model to 19 new languages from the FLEURS dataset.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2406.07842

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.70)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models

Zhou, Kun, Zhang, Beichen, Wang, Jiapeng, Chen, Zhipeng, Zhao, Wayne Xin, Sha, Jing, Sheng, Zhichao, Wang, Shijin, Wen, Ji-Rong

arXiv.org Artificial IntelligenceMay-23-2024

Mathematical reasoning is an important capability of large language models~(LLMs) for real-world applications. To enhance this capability, existing work either collects large-scale math-related texts for pre-training, or relies on stronger LLMs (\eg GPT-4) to synthesize massive math problems. Both types of work generally lead to large costs in training or synthesis. To reduce the cost, based on open-source available texts, we propose an efficient way that trains a small LLM for math problem synthesis, to efficiently generate sufficient high-quality pre-training data. To achieve it, we create a dataset using GPT-4 to distill its data synthesis capability into the small LLM. Concretely, we craft a set of prompts based on human education stages to guide GPT-4, to synthesize problems covering diverse math knowledge and difficulty levels. Besides, we adopt the gradient-based influence estimation method to select the most valuable math-related texts. The both are fed into GPT-4 for creating the knowledge distillation dataset to train the small LLM. We leverage it to synthesize 6 million math problems for pre-training our JiuZhang3.0 model, which only needs to invoke GPT-4 API 9.3k times and pre-train on 4.6B data. Experimental results have shown that JiuZhang3.0 achieves state-of-the-art performance on several mathematical reasoning datasets, under both natural language reasoning and tool manipulation settings. Our code and data will be publicly released in \url{https://github.com/RUCAIBox/JiuZhang3.0}.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2405.14365

Country:

Asia > China (0.28)
North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > K-12 Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint

Chen, Zhipeng, Zhou, Kun, Zhao, Wayne Xin, Wan, Junchen, Zhang, Fuzheng, Zhang, Di, Wen, Ji-Rong

arXiv.org Artificial IntelligenceJan-11-2024

Reinforcement learning (RL) has been widely used in training large language models~(LLMs) for preventing unexpected outputs, \eg reducing harmfulness and errors. However, existing RL methods mostly adopt the instance-level reward, which is unable to provide fine-grained supervision for complex reasoning tasks, and can not focus on the few key tokens that lead to the incorrectness. To address it, we propose a new RL method named \textbf{RLMEC} that incorporates a generative model as the reward model, which is trained by the erroneous solution rewriting task under the minimum editing constraint, and can produce token-level rewards for RL training. Based on the generative reward model, we design the token-level RL objective for training and an imitation-based regularization for stabilizing RL process. And the both objectives focus on the learning of the key tokens for the erroneous solution, reducing the effect of other unimportant tokens. The experiment results on mathematical tasks and question-answering tasks have demonstrated the effectiveness of our approach. Our code and data are available at \url{https://github.com/RUCAIBox/RLMEC}.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2401.06081

Country:

North America > United States > New York (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Hawaii (0.14)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback