AITopics | Zhang, Qiyuan

Collaborating Authors

Zhang, Qiyuan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Crowd Comparative Reasoning: Unlocking Comprehensive Evaluations for LLM-as-a-Judge

Zhang, Qiyuan, Wang, Yufei, Jiang, Yuxin, Li, Liangyou, Wu, Chuhan, Wang, Yasheng, Jiang, Xin, Shang, Lifeng, Tang, Ruiming, Lyu, Fuyuan, Ma, Chen

arXiv.org Artificial IntelligenceFeb-17-2025

LLM-as-a-Judge, which generates chain-of-thought (CoT) judgments, has become a widely adopted auto-evaluation method. However, its reliability is compromised by the CoT reasoning's inability to capture comprehensive and deeper details, often leading to incomplete outcomes. Existing methods mainly rely on majority voting or criteria expansion, which is insufficient to address the limitation in CoT. We propose Crowd-based Comparative Evaluation, which introduces additional crowd responses to compare with the candidate responses, thereby exposing deeper and more comprehensive details within the candidate responses. This process effectively guides LLM-as-a-Judge to provide a more detailed CoT judgment. Extensive experiments demonstrate that our approach enhances evaluation reliability, achieving an average accuracy gain of 6.7% across five benchmarks. Moreover, our method produces higher-quality CoTs that facilitate judge distillation and exhibit superior performance in rejection sampling for supervised fine-tuning (SFT), referred to as crowd rejection sampling, thereby enabling more efficient SFT. Our analysis confirms that CoTs generated by ours are more comprehensive and of higher quality, and evaluation accuracy improves as inference scales.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.12501

Country: Asia (0.46)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

HiLo: Learning Whole-Body Human-like Locomotion with Motion Tracking Controller

Zhang, Qiyuan, Weng, Chenfan, Li, Guanwu, He, Fulai, Cai, Yusheng

arXiv.org Artificial IntelligenceFeb-5-2025

Deep Reinforcement Learning (RL) has emerged as a promising method to develop humanoid robot locomotion controllers. Despite the robust and stable locomotion demonstrated by previous RL controllers, their behavior often lacks the natural and agile motion patterns necessary for human-centric scenarios. In this work, we propose HiLo (human-like locomotion with motion tracking), an effective framework designed to learn RL policies that perform human-like locomotion. The primary challenges of human-like locomotion are complex reward engineering and domain randomization. HiLo overcomes these issues by developing an RL-based motion tracking controller and simple domain randomization through random force injection and action delay. Within the framework of HiLo, the whole-body control problem can be decomposed into two components: One part is solved using an open-loop control method, while the residual part is addressed with RL policies. A distributional value function is also implemented to stabilize the training process by improving the estimation of cumulative rewards under perturbed dynamics. Our experiments demonstrate that the motion tracking controller trained using HiLo can perform natural and agile human-like locomotion while exhibiting resilience to external disturbances in real-world systems. Furthermore, we show that the motion patterns of humanoid robots can be adapted through the residual mechanism without fine-tuning, allowing quick adjustments to task requirements.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2502.03122

Country: Asia > China (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.66)

Add feedback

NILE: Internal Consistency Alignment in Large Language Models

Hu, Minda, Zhang, Qiyuan, Wang, Yufei, He, Bowei, Wang, Hongru, Zhou, Jingyan, Li, Liangyou, Wang, Yasheng, Ma, Chen, King, Irwin

arXiv.org Artificial IntelligenceDec-21-2024

As a crucial step to enhance LLMs alignment with human intentions, Instruction Fine-Tuning (IFT) has a high demand on dataset quality. However, existing IFT datasets often contain knowledge that is inconsistent with LLMs' internal knowledge learned from the pre-training phase, which can greatly affect the efficacy of IFT. To address this issue, we introduce NILE (iNternal consIstency aLignmEnt) framework, aimed at optimizing IFT datasets to unlock LLMs' capability further. NILE operates by eliciting target pre-trained LLM's internal knowledge corresponding to instruction data. The internal knowledge is leveraged to revise the answer in IFT datasets. Additionally, we propose a novel Internal Consistency Filtering (ICF) method to filter training samples, ensuring its high consistency with LLM's internal knowledge. Our experiments demonstrate that NILE-aligned IFT datasets sharply boost LLM performance across multiple LLM ability evaluation datasets, achieving up to 66.6% gain on Arena-Hard and 68.5% on Alpaca-Eval V2. Further analysis confirms that each component of the NILE}framework contributes to these substantial performance improvements, and provides compelling evidence that dataset consistency with pre-trained internal knowledge is pivotal for maximizing LLM potential.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.16686

Country:

Asia (1.00)
North America > Mexico (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Musculoskeletal (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

RevisEval: Improving LLM-as-a-Judge via Response-Adapted References

Zhang, Qiyuan, Wang, Yufei, YU, Tiezheng, Jiang, Yuxin, Wu, Chuhan, Li, Liangyou, Wang, Yasheng, Jiang, Xin, Shang, Lifeng, Tang, Ruiming, Lyu, Fuyuan, Ma, Chen

arXiv.org Artificial IntelligenceOct-7-2024

With significant efforts in recent studies, LLM-as-a-Judge has become a cost-effective alternative to human evaluation for assessing the text generation quality in a wide range of tasks. However, there still remains a reliability gap between LLM-as-a-Judge and human evaluation. One important reason is the lack of guided oracles in the evaluation process. Motivated by the role of reference pervasively used in classic text evaluation, we introduce RevisEval, a novel text generation evaluation paradigm via the response-adapted references. RevisEval is driven by the key observation that an ideal reference should maintain the necessary relevance to the response to be evaluated. Specifically, RevisEval leverages the text revision capabilities of large language models (LLMs) to adaptively revise the response, then treat the revised text as the reference (response-adapted reference) for the subsequent evaluation. Extensive experiments demonstrate that RevisEval outperforms traditional reference-free and reference-based evaluation paradigms that use LLM-as-a-Judge across NLG tasks and open-ended instruction-following tasks. More importantly, our response-adapted references can further boost the classical text metrics, e.g., BLEU and BERTScore, compared to traditional references and even rival the LLM-as-a-Judge. A detailed analysis is also conducted to confirm RevisEval's effectiveness in bias reduction, the impact of inference cost, and reference relevance.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.05193

Country:

North America > United States (0.14)
Asia > China (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Collaborative Performance Prediction for Large Language Models

Zhang, Qiyuan, Lyu, Fuyuan, Liu, Xue, Ma, Chen

arXiv.org Artificial IntelligenceJul-1-2024

Comprehensively understanding and accurately predicting the performance of large language models across diverse downstream tasks has emerged as a pivotal challenge in NLP research. The pioneering scaling law on downstream works demonstrated intrinsic similarities within model families and utilized such similarities for performance prediction. However, they tend to overlook the similarities between model families and only consider design factors listed in the original scaling law. To overcome these limitations, we introduce a novel framework, Collaborative Performance Prediction (CPP), which significantly enhances prediction accuracy by leveraging the historical performance of various models on downstream tasks and other design factors for both model and task. We also collect a collaborative data sourced from online platforms containing both historical performance and additional design factors. With the support of the collaborative data, CPP not only surpasses traditional scaling laws in predicting the performance of scaled LLMs but also facilitates a detailed analysis of factor importance, an area previously overlooked.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2407.013

Country: Asia (0.14)

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Introducing RISK

Wallbridge, Christopher D., Zhang, Qiyuan

arXiv.org Artificial IntelligenceJul-17-2022

This extended abstract introduces the initial steps taken to develop a system for Rapid Internal Simulation of Knowledge (RISK). RISK aims to enable more transparency in artificial intelligence systems, especially those created by deep learning networks by allowing real-time simulation of what the system knows. By looking at hypothetical situations based on these simulations a system may make more informed decisions, and produce them for non-expert observers to understand the reasoning behind a given action.

artificial intelligence, machine learning, robot, (18 more...)

arXiv.org Artificial Intelligence

2208.07306

Country: Europe > United Kingdom > Wales (0.16)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning

Yang, Yiqin, Ma, Xiaoteng, Li, Chenghao, Zheng, Zewu, Zhang, Qiyuan, Huang, Gao, Yang, Jun, Zhao, Qianchuan

arXiv.org Artificial IntelligenceJun-7-2021

Learning from datasets without interaction with environments (Offline Learning) is an essential step to apply Reinforcement Learning (RL) algorithms in real-world scenarios. However, compared with the single-agent counterpart, offline multi-agent RL introduces more agents with the larger state and action space, which is more challenging but attracts little attention. We demonstrate current offline RL algorithms are ineffective in multi-agent systems due to the accumulated extrapolation error. In this paper, we propose a novel offline RL algorithm, named Implicit Constraint Q-learning (ICQ), which effectively alleviates the extrapolation error by only trusting the state-action pairs given in the dataset for value estimation. Moreover, we extend ICQ to multi-agent tasks by decomposing the joint-policy under the implicit constraint. Experimental results demonstrate that the extrapolation error is reduced to almost zero and insensitive to the number of agents. We further show that ICQ achieves the state-of-the-art performance in the challenging multi-agent offline tasks (StarCraft II).

artificial intelligence, extrapolation error, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2106.034

Country:

North America > United States (0.14)
Europe > Italy (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Leisure & Entertainment > Games (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.35)

Add feedback