AITopics | Ding, Yiwen

Collaborating Authors

Ding, Yiwen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision

Xi, Zhiheng, Yang, Dingwen, Huang, Jixuan, Tang, Jiafu, Li, Guanyu, Ding, Yiwen, He, Wei, Hong, Boyang, Do, Shihan, Zhan, Wenyu, Wang, Xiao, Zheng, Rui, Ji, Tao, Shi, Xiaowei, Zhai, Yitao, Weng, Rongxiang, Wang, Jingang, Cai, Xunliang, Gui, Tao, Wu, Zuxuan, Zhang, Qi, Qiu, Xipeng, Huang, Xuanjing, Jiang, Yu-Gang

arXiv.org Artificial IntelligenceNov-25-2024

Training large language models (LLMs) to spend more time thinking and reflection before responding is crucial for effectively solving complex reasoning tasks in fields such as science, coding, and mathematics. However, the effectiveness of mechanisms like self-reflection and self-correction depends on the model's capacity to accurately assess its own performance, which can be limited by factors such as initial accuracy, question difficulty, and the lack of external feedback. In this paper, we delve into a two-player paradigm that separates the roles of reasoning and critique models, where the critique model provides step-level feedback to supervise the reasoning (actor) model during both test-time and training-time. We first propose AutoMathCritique, an automated and scalable framework for collecting critique data, resulting in a dataset of 76, 321 responses paired with step-level feedback. Fine-tuning language models with this dataset enables them to generate natural language feedback for mathematical reasoning. We demonstrate that the critique models consistently improve the actor's performance on difficult queries at test-time, especially when scaling up inference-time computation. Motivated by these findings, we introduce the critique-based supervision to the actor's selftraining process, and propose a critique-in-the-loop self-improvement method. Experiments show that the method improves the actor's exploration efficiency and solution diversity, especially on challenging queries, leading to a stronger reasoning model. Lastly, we take the preliminary step to explore training self-talk reasoning models via critique supervision and showcase their potential.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.16579

Country:

Europe > Austria (0.29)
North America > United States (0.29)
Asia > Middle East (0.28)
North America > Mexico (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Mitigating Tail Narrowing in LLM Self-Improvement via Socratic-Guided Sampling

Ding, Yiwen, Xi, Zhiheng, He, Wei, Li, Zhuoyuan, Zhai, Yitao, Shi, Xiaowei, Cai, Xunliang, Gui, Tao, Zhang, Qi, Huang, Xuanjing

arXiv.org Artificial IntelligenceNov-1-2024

Self-improvement methods enable large language models (LLMs) to generate solutions themselves and iteratively train on filtered, high-quality rationales. This process proves effective and reduces the reliance on human supervision in LLMs' reasoning, but the performance soon plateaus. We delve into the process and find that models tend to over-sample on easy queries and under-sample on queries they have yet to master. As iterations proceed, this imbalance in sampling is exacerbated, leading to a long-tail distribution where solutions to difficult queries almost diminish. This phenomenon limits the performance gain of self-improving models. A straightforward solution is brute-force sampling to balance the distribution, which significantly raises computational costs. In this paper, we introduce Guided Self-Improvement (GSI), a strategy aimed at improving the efficiency of sampling challenging heavy-tailed data. It leverages Socratic-style guidance signals to help LLM reasoning with complex queries, reducing the exploration effort and minimizing computational overhead. Experiments on four models across diverse mathematical tasks show that GSI strikes a balance between performance and efficiency, while also being effective on held-out tasks.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.0075

Country:

North America > United States (1.00)
Europe (0.69)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Distill Visual Chart Reasoning Ability from LLMs to MLLMs

He, Wei, Xi, Zhiheng, Zhao, Wanxu, Fan, Xiaoran, Ding, Yiwen, Shan, Zifei, Gui, Tao, Zhang, Qi, Huang, Xuanjing

arXiv.org Artificial IntelligenceOct-24-2024

Solving complex chart Q&A tasks requires advanced visual reasoning abilities in multimodal large language models (MLLMs). Recent studies highlight that these abilities consist of two main parts: recognizing key information from visual inputs and conducting reasoning over it. Thus, a promising approach to enhance MLLMs is to construct relevant training data focusing on the two aspects. However, collecting and annotating complex charts and questions is costly and timeconsuming, and ensuring the quality of annotated answers remains a challenge. In this paper, we propose Code-as-Intermediary Translation (CIT), a cost-effective, efficient and easily scalable data synthesis method for distilling visual reasoning abilities from LLMs to MLLMs. The code serves as an intermediary that translates visual chart representations into textual representations, enabling LLMs to understand cross-modal information. QA, a dataset containing 3k reasoning-intensive charts and 20k Q&A pairs to enhance both recognition and reasoning abilities. Experiments show that when fine-tuned with our data, models not only perform well on chart-related benchmarks, but also demonstrate improved multimodal reasoning abilities on general mathematical benchmarks like MathVista. Multimodal large language models (MLLMs) have made significant achievements, particularly in visual recognition tasks (OpenAI, 2024a; Anthropic, 2024). While they can handle simple visual inputs well, there has been a growing emphasis on complex chart understanding, driven by the widespread use of charts in real-world contexts (Masry et al., 2022; Huang et al., 2024). However, addressing reasoning-intensive questions involving charts remains challenging for these models. Existing benchmarks underscore the need for more advanced and generalized visual reasoning abilities, which are still underdeveloped in current MLLMs (Wang et al., 2024c; Lu et al., 2024). Our analysis of the error distribution in ChartQA (Figure 1) also highlights two main types of model failure: 62% of errors stem from misrecognition, while 36% arise from reasoning mistakes after correct recognition. This shows that even advanced MLLMs struggle with basic recognition and often make superficial reasoning errors. In contrast, humans excel at these tasks by purposefully identifying query-relevant information from images and engaging in step-by-step reasoning (Wang et al., 2024c;a). In light of these findings, enabling models to solve problems in a human-like manner, becomes essential for advancing visual reasoning performance. One promising strategy is to distill the rationales of reasoning from experts, such as human or stronger models (Han et al., 2023; Meng et al., 2024; Masry et al., 2024a;b) However, creating highquality training data for chart-related tasks is costly and time-consuming.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.18798

Country:

Europe > Austria (0.28)
North America > United States (0.28)
North America > Mexico (0.28)
Asia > Middle East (0.28)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models

He, Wei, Liu, Shichun, Zhao, Jun, Ding, Yiwen, Lu, Yi, Xi, Zhiheng, Gui, Tao, Zhang, Qi, Huang, Xuanjing

arXiv.org Artificial IntelligenceMar-31-2024

Large language models (LLMs) have shown promising abilities of in-context learning (ICL), adapting swiftly to new tasks with only few-shot demonstrations. However, current few-shot methods heavily depend on high-quality, query-specific demos, which are often lacking. When faced with out-of-demonstration (OOD) queries, methods that rely on hand-crafted demos or external retrievers might fail. To bridge the gap between limited demos and OOD queries, we propose Self-Demos, a novel prompting method that elicits the inherent generalizability in LLMs by query-aware demo generation. The generated demos strategically interpolate between existing demos and the given query, transforming the query from OOD to ID. To evaluate the effectiveness of our approach, we manually constructed OOD-Toolset, a dataset in the tool-using scenario with over 300 real-world APIs and 1000 instances, each consisting of three tool-use cases as demos and an OOD query. Thorough experiments on our dataset and two public math benchmarks have shown that our method can outperform state-of-the-art baselines in the OOD setting. Moreover, we conduct a range of analyses to validate Self-Demos's generalization and provide more insights.

demo, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2404.00884

Country:

Asia > China (0.30)
North America > United States (0.28)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration

Zhao, Jun, Zu, Can, Xu, Hao, Lu, Yi, He, Wei, Ding, Yiwen, Gui, Tao, Zhang, Qi, Huang, Xuanjing

arXiv.org Artificial IntelligenceMar-13-2024

Large language models (LLMs) have demonstrated impressive performance in understanding language and executing complex reasoning tasks. However, LLMs with long context windows have been notorious for their expensive training costs and high inference latency. Even the most advanced models such as GPT-4 and Claude2 often make mistakes when processing inputs of over $100k$ tokens, a phenomenon also known as \textit{lost in the middle}. In this paper, we propose \textsc{LongAgent}, a method based on multi-agent collaboration, which scales LLMs (e.g., LLaMA) to a context of 128K and demonstrates potential superiority in long-text processing compared to GPT-4. In \textsc{LongAgent}, a leader is responsible for understanding user intent and directing team members to acquire information from documents. Due to members' hallucinations, it is non-trivial for a leader to obtain accurate information from the responses of dozens to hundreds of members. To address this, we develop an \textit{inter-member communication} mechanism to resolve response conflicts caused by hallucinations through information sharing. Our experimental results indicate that \textsc{LongAgent} offers a promising alternative for long-text processing. The agent team instantiated with LLaMA-7B achieves significant improvements in tasks such as 128k-long text retrieval, multi-hop question answering, compared to GPT-4.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2402.1155

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning

Xi, Zhiheng, Chen, Wenxiang, Hong, Boyang, Jin, Senjie, Zheng, Rui, He, Wei, Ding, Yiwen, Liu, Shichun, Guo, Xin, Wang, Junzhe, Guo, Honglin, Shen, Wei, Fan, Xiaoran, Zhou, Yuhao, Dou, Shihan, Wang, Xiao, Zhang, Xinbo, Sun, Peng, Gui, Tao, Zhang, Qi, Huang, Xuanjing

arXiv.org Artificial IntelligenceFeb-8-2024

In this paper, we propose R$^3$: Learning Reasoning through Reverse Curriculum Reinforcement Learning (RL), a novel method that employs only outcome supervision to achieve the benefits of process supervision for large language models. The core challenge in applying RL to complex reasoning is to identify a sequence of actions that result in positive rewards and provide appropriate supervision for optimization. Outcome supervision provides sparse rewards for final results without identifying error locations, whereas process supervision offers step-wise rewards but requires extensive manual annotation. R$^3$ overcomes these limitations by learning from correct demonstrations. Specifically, R$^3$ progressively slides the start state of reasoning from a demonstration's end to its beginning, facilitating easier model exploration at all stages. Thus, R$^3$ establishes a step-wise curriculum, allowing outcome supervision to offer step-level signals and precisely pinpoint errors. Using Llama2-7B, our method surpasses RL baseline on eight reasoning tasks by $4.1$ points on average. Notebaly, in program-based reasoning on GSM8K, it exceeds the baseline by $4.2$ points across three backbone models, and without any extra data, Codellama-7B + R$^3$ performs comparable to larger models or closed-source models.

large language model, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2402.05808

Country:

Europe (1.00)
North America > United States > California > Santa Clara County (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

The Rise and Potential of Large Language Model Based Agents: A Survey

Xi, Zhiheng, Chen, Wenxiang, Guo, Xin, He, Wei, Ding, Yiwen, Hong, Boyang, Zhang, Ming, Wang, Junzhe, Jin, Senjie, Zhou, Enyu, Zheng, Rui, Fan, Xiaoran, Wang, Xiao, Xiong, Limao, Zhou, Yuhao, Wang, Weiran, Jiang, Changhao, Zou, Yicheng, Liu, Xiangyang, Yin, Zhangyue, Dou, Shihan, Weng, Rongxiang, Cheng, Wensen, Zhang, Qi, Qin, Wenjuan, Zheng, Yongyan, Qiu, Xipeng, Huang, Xuanjing, Gui, Tao

arXiv.org Artificial IntelligenceSep-19-2023

For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are artificial entities that sense their environment, make decisions, and take actions. Many efforts have been made to develop intelligent agents, but they mainly focus on advancement in algorithms or training strategies to enhance specific capabilities or performance on particular tasks. Actually, what the community lacks is a general and powerful model to serve as a starting point for designing AI agents that can adapt to diverse scenarios. Due to the versatile capabilities they demonstrate, large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI), offering hope for building general AI agents. Many researchers have leveraged LLMs as the foundation to build AI agents and have achieved significant progress. In this paper, we perform a comprehensive survey on LLM-based agents. We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents. Building upon this, we present a general framework for LLM-based agents, comprising three main components: brain, perception, and action, and the framework can be tailored for different applications. Subsequently, we explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation. Following this, we delve into agent societies, exploring the behavior and personality of LLM-based agents, the social phenomena that emerge from an agent society, and the insights they offer for human society. Finally, we discuss several key topics and open problems within the field. A repository for the related papers at https://github.com/WooooDyy/LLM-Agent-Paper-List.

computer science, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2309.07864

Country:

North America > Canada (1.00)
Oceania (0.92)
Asia > Middle East (0.67)
(4 more...)

Genre:

Overview (1.00)
Instructional Material (0.92)
Research Report > New Finding (0.92)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Causal Kripke Models

Ding, Yiwen, Manoorkar, Krishna, Tzimoulis, Apostolos, Wang, Ruoding, Wang, Xiaolong

arXiv.org Artificial IntelligenceJul-11-2023

Causality is crucial in human reasoning and knowledge. Defining and formalizing causality has been a significant area of research in philosophy and formal methods [12, 21, 24, 11]. In recent years, with the rise of machine learning and AI, there has been growing interest in formalizing causal reasoning. One of the key areas of AI research is designing algorithms capable of comprehending causal information and performing causal reasoning [5, 29, 30]. Causal reasoning can be instrumental in formally modeling notions such as responsibility, blame, harm, and explanation, which are important aspects in designing ethical and responsible AI systems [3]. In this article we focus on the kind of causality known as "actual causality" (a.k.a.

artificial intelligence, causality, logic & formal reasoning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.4204/EPTCS.379.16

2307.05631

Country:

Europe > Netherlands (0.29)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)

Add feedback

Voices of Her: Analyzing Gender Differences in the AI Publication World

Ding, Yiwen, Liu, Jiarui, Lyu, Zhiheng, Zhang, Kun, Schoelkopf, Bernhard, Jin, Zhijing, Mihalcea, Rada

arXiv.org Artificial IntelligenceMay-23-2023

While several previous studies have analyzed gender bias in research, we are still missing a comprehensive analysis of gender differences in the AI community, covering diverse topics and different development trends. Using the AI Scholar dataset of 78K researchers in the field of AI, we identify several gender differences: (1) Although female researchers tend to have fewer overall citations than males, this citation difference does not hold for all academic-age groups; (2) There exist large gender homophily in co-authorship on AI papers; (3) Female first-authored papers show distinct linguistic styles, such as longer text, more positive emotion words, and more catchy titles than male first-authored papers. Our analysis provides a window into the current demographic trends in our AI community, and encourages more gender equality and diversity in the future. Our code and data are at https://github.com/causalNLP/ai-scholar-gender.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.14597

Country:

North America > United States (0.93)
Europe (0.93)
Asia (0.93)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
Health & Medicine > Therapeutic Area > Immunology (0.67)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

Logical Fallacy Detection

Jin, Zhijing, Lalwani, Abhinav, Vaidhya, Tejas, Shen, Xiaoyu, Ding, Yiwen, Lyu, Zhiheng, Sachan, Mrinmaya, Mihalcea, Rada, Schölkopf, Bernhard

arXiv.org Artificial IntelligenceDec-11-2022

Reasoning is central to human intelligence. However, fallacious arguments are common, and some exacerbate problems such as spreading misinformation about climate change. In this paper, we propose the task of logical fallacy detection, and provide a new dataset (Logic) of logical fallacies generally found in text, together with an additional challenge set for detecting logical fallacies in climate change claims (LogicClimate). Detecting logical fallacies is a hard problem as the model must understand the underlying logical structure of the argument. We find that existing pretrained large language models perform poorly on this task. In contrast, we show that a simple structure-aware classifier outperforms the best language model by 5.46% on Logic and 4.51% on LogicClimate. We encourage future work to explore this task as (a) it can serve as a new reasoning challenge for language models, and (b) it can have potential applications in tackling the spread of misinformation. Our dataset and code are available at https://github.com/causalNLP/logical-fallacy

artificial intelligence, fallacy, natural language, (16 more...)

arXiv.org Artificial Intelligence

2202.13758

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (0.40)

Industry:

Media > News (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Government (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback