AITopics | Yang, Qingping

Plotting

Yang, Qingping

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback

Shen, Wei, Liu, Guanlin, Wu, Zheng, Zhu, Ruofei, Yang, Qingping, Xin, Chao, Yue, Yu, Yan, Lin

arXiv.org Artificial IntelligenceApr-2-2025

Reinforcement Learning from Human Feedback (RLHF) is crucial for aligning large language models with human preferences. While recent research has focused on algorithmic improvements, the importance of prompt-data construction has been overlooked. This paper addresses this gap by exploring data-driven bottlenecks in RLHF performance scaling, particularly reward hacking and decreasing response diversity. We introduce a hybrid reward system combining reasoning task verifiers (RTV) and a generative reward model (GenRM) to mitigate reward hacking. We also propose a novel prompt-selection method, Pre-PPO, to maintain response diversity and enhance learning effectiveness. Additionally, we find that prioritizing mathematical and coding tasks early in RLHF training significantly improves performance. Experiments across two model sizes validate our methods' effectiveness and scalability. Results show that RTV is most resistant to reward hacking, followed by GenRM with ground truth, and then GenRM with SFT Best-of-N responses. Our strategies enable rapid capture of subtle task-specific distinctions, leading to substantial improvements in overall RLHF performance. This work highlights the importance of careful data construction and provides practical methods to overcome performance barriers in RLHF.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.2223

Country: Asia > China (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Water & Waste Management > Solid Waste Management (0.94)
Education (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts

Yang, Bo, Yang, Qingping, Ma, Yingwei, Liu, Runtao

arXiv.org Artificial IntelligenceJan-14-2025

The evaluation of mathematical reasoning capabilities is essential for advancing Artificial General Intelligence (AGI). While Large Language Models (LLMs) have shown impressive performance in solving mathematical problems, existing benchmarks such as GSM8K and MATH present limitations, including narrow problem definitions with specific numbers and reliance on predetermined rules that hinder accurate assessments of reasoning and generality. This paper introduces the UTMath Benchmark, a robust evaluation framework designed to assess LLMs through extensive unit tests, with a focus on both the accuracy and generality of model responses. It comprises 1,053 cutting-edge problems spanning nine mathematical domains, with an average of 68 test cases per problem. UTMath is highly challenging, with the best-performing model, o1-mini, solving only 32.57\% of the problems, followed by o1-preview at 27.16\%, and GPT-4o at 26.93\%. Furthermore, we present the Reasoning-to-Coding of Thoughts (RCoT) approach, which encourages LLMs to engage in explicit reasoning prior to code generation, thereby facilitating the production of more sophisticated solutions and enhancing overall performance and efficiency. Additionally, we also release the UTMath-Train training dataset (more than 70k samples), to support the community in further exploring mathematical reasoning. Our benchmark can be accessed via the following link: https://github.com/UTMathGroup/UTMath

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.0724

Country:

North America > United States > Texas (0.14)
Asia > China (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How to Understand Whole Software Repository?

Ma, Yingwei, Yang, Qingping, Cao, Rongyu, Li, Binhua, Huang, Fei, Li, Yongbin

arXiv.org Artificial IntelligenceJun-3-2024

Recently, Large Language Model (LLM) based agents have advanced the significant development of Automatic Software Engineering (ASE). Although verified effectiveness, the designs of the existing methods mainly focus on the local information of codes, e.g., issues, classes, and functions, leading to limitations in capturing the global context and interdependencies within the software system. From the practical experiences of the human SE developers, we argue that an excellent understanding of the whole repository will be the critical path to ASE. However, understanding the whole repository raises various challenges, e.g., the extremely long code input, the noisy code information, the complex dependency relationships, etc. To this end, we develop a novel ASE method named RepoUnderstander by guiding agents to comprehensively understand the whole repositories. Specifically, we first condense the critical information of the whole repository into the repository knowledge graph in a top-to-down mode to decrease the complexity of repository. Subsequently, we empower the agents the ability of understanding whole repository by proposing a Monte Carlo tree search based repository exploration strategy. In addition, to better utilize the repository-level knowledge, we guide the agents to summarize, analyze, and plan. Then, they can manipulate the tools to dynamically acquire information and generate the patches to solve the real-world GitHub issues. Extensive experiments demonstrate the superiority and effectiveness of the proposed RepoUnderstander. It achieved 18.5\% relative improvement on the SWE-bench Lite benchmark compared to SWE-agent.

information, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2406.01422

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback