AITopics | Wang, Danqing

Collaborating Authors

Wang, Danqing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Focus Directions Make Your Language Models Pay More Attention to Relevant Contexts

Zhu, Youxiang, Li, Ruochen, Wang, Danqing, Haehn, Daniel, Liang, Xiaohui

arXiv.org Artificial IntelligenceMar-30-2025

Long-context large language models (LLMs) are prone to be distracted by irrelevant contexts. The reason for distraction remains poorly understood. In this paper, we first identify the contextual heads, a special group of attention heads that control the overall attention of the LLM. Then, we demonstrate that distraction arises when contextual heads fail to allocate sufficient attention to relevant contexts and can be mitigated by increasing attention to these contexts. We further identify focus directions, located at the key and query activations of these heads, which enable them to allocate more attention to relevant contexts without explicitly specifying which context is relevant. We comprehensively evaluate the effect of focus direction on various long-context tasks and find out focus directions could help to mitigate the poor task alignment of the long-context LLMs. We believe our findings could promote further research on long-context LLM alignment.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.23306

Country: North America > United States (0.28)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Scaling LLM Inference with Optimized Sample Compute Allocation

Zhang, Kexun, Zhou, Shang, Wang, Danqing, Wang, William Yang, Li, Lei

arXiv.org Artificial IntelligenceOct-29-2024

Sampling is a basic operation in many inference-time algorithms of large language models (LLMs). To scale up inference efficiently with a limited compute, it is crucial to find an optimal allocation for sample compute budgets: Which sampling configurations (model, temperature, language, etc.) do we use? How many samples do we generate in each configuration? We formulate these choices as a learning problem and propose OSCA, an algorithm that Optimizes Sample Compute Allocation by finding an optimal mix of different inference configurations. Our experiments show that with our learned mixed allocation, we can achieve accuracy better than the best single configuration with 128x less compute on code generation and 25x less compute on 4 reasoning tasks. OSCA is also shown to be effective in agentic workflows beyond single-turn tasks, achieving a better accuracy on SWE-Bench with 3x less compute than the default configuration. Our code and generations are released at https://github.com/LeiLiLab/OSCA.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.2248

Country:

North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report (0.64)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Cooperative Strategic Planning Enhances Reasoning Capabilities in Large Language Models

Wang, Danqing, Ye, Zhuorui, Fang, Fei, Li, Lei

arXiv.org Artificial IntelligenceOct-25-2024

Enhancing the reasoning capabilities of large language models (LLMs) is crucial for enabling them to tackle complex, multi-step problems. Multi-agent frameworks have shown great potential in enhancing LLMs' reasoning capabilities. However, the lack of effective cooperation between LLM agents hinders their performance, especially for multi-step reasoning tasks. This paper proposes a novel cooperative multi-agent reasoning framework (CoPlanner) by separating reasoning steps and assigning distinct duties to different agents. CoPlanner consists of two LLM agents: a planning agent and a reasoning agent. The planning agent provides high-level strategic hints, while the reasoning agent follows these hints and infers answers. By training the planning agent's policy through the interactive reasoning process via Proximal Policy Optimization (PPO), the LLaMA-3-8B-based CoPlanner outperforms the previous best method by 9.94\% on LogiQA and 3.09\% on BBH. Our results demonstrate that the guidance from the planning agent and the effective cooperation between the agents contribute to the superior performance of CoPlanner in tackling multi-step reasoning problems.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2410.20007

Country: North America > Canada (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

TypedThinker: Typed Thinking Improves Large Language Model Reasoning

Wang, Danqing, Ma, Jianxin, Fang, Fei, Li, Lei

arXiv.org Artificial IntelligenceOct-2-2024

Despite significant advancements in the reasoning capabilities of Large Language Models (LLMs), the lack of diverse reasoning solutions often makes them trapped in a limited solution search area. In this paper, we propose TypedThinker, a novel framework that enhances LLMs' problem-solving abilities by incorporating multiple reasoning types (deductive, inductive, abductive, and analogical). Our analysis across four benchmarks reveals that different reasoning types uniquely solve distinct sets of problems, highlighting the importance of diverse thinking approaches. TypedThinker addresses two key challenges: selecting appropriate reasoning types for given problems and effectively implementing specific reasoning types. Through self-training on successful experiences, TypedThinker learns an implicit policy for reasoning type selection and application. Experimental results demonstrate significant improvements over baseline models, with accuracy increases of 3.4% for Mistral 7B and 16.7% for LLaMA3 8B across four reasoning benchmarks. Notably, TypedThinker shows effective generalization to new benchmarks and can further enhance the reasoning capability of powerful models like GPT-4o. The code is released at https://github.com/dqwang122/ThinkHub.

large language model, machine learning, reasoning type, (20 more...)

arXiv.org Artificial Intelligence

2410.01952

Country:

Asia > China (0.46)
North America > United States (0.28)

Genre: Research Report > New Finding (0.87)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Global Human-guided Counterfactual Explanations for Molecular Properties via Reinforcement Learning

Wang, Danqing, Antoniades, Antonis, Luong, Kha-Dinh, Zhang, Edwin, Kosan, Mert, Li, Jiachen, Singh, Ambuj, Wang, William Yang, Li, Lei

arXiv.org Artificial IntelligenceJun-19-2024

Counterfactual explanations of Graph Neural Networks (GNNs) offer a powerful way to understand data that can naturally be represented by a graph structure. Furthermore, in many domains, it is highly desirable to derive data-driven global explanations or rules that can better explain the high-level properties of the models and data in question. However, evaluating global counterfactual explanations is hard in real-world datasets due to a lack of human-annotated ground truth, which limits their use in areas like molecular sciences. Additionally, the increasing scale of these datasets provides a challenge for random search-based methods. In this paper, we develop a novel global explanation model RLHEX for molecular property prediction. It aligns the counterfactual explanations with human-defined principles, making the explanations more interpretable and easy for experts to evaluate. RLHEX includes a VAE-based graph generator to generate global explanations and an adapter to adjust the latent representation space to human-defined principles. Optimized by Proximal Policy Optimization (PPO), the global explanations produced by RLHEX cover 4.12% more input graphs and reduce the distance between the counterfactual explanation set and the input set by 0.47% on average across three molecular datasets. RLHEX provides a flexible framework to incorporate different human-designed principles into the counterfactual explanation generation process, aligning these explanations with domain expertise. The code and data are released at https://github.com/dqwang122/RLHEX.

explanation, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.13869

Country: North America > United States > California (0.28)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.49)
Health & Medicine > Therapeutic Area > Immunology (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ALGO: Synthesizing Algorithmic Programs with LLM-Generated Oracle Verifiers

Zhang, Kexun, Wang, Danqing, Xia, Jingtao, Wang, William Yang, Li, Lei

arXiv.org Artificial IntelligenceDec-7-2023

Large language models (LLMs) excel at implementing code from functionality descriptions but struggle with algorithmic problems that require not only implementation but also identification of the suitable algorithm. Moreover, LLM-generated programs lack guaranteed correctness and require human verification. To address these challenges, we propose ALGO, a framework that synthesizes Algorithmic programs with LLM-Generated Oracles to guide the generation and verify their correctness. ALGO first generates a reference oracle by prompting an LLM to exhaustively enumerate all the combinations of relevant variables. This oracle is then utilized to guide an arbitrary search strategy in exploring the algorithm space and to verify the synthesized algorithms. Our study shows that the LLM-generated oracles are correct for 88% of the cases. With the oracles as verifiers, ALGO can be integrated with any existing code generation model in a model-agnostic manner to enhance its performance. Experiments show that when equipped with ALGO, we achieve an 8x better one-submission pass rate over the Codex model and a 2.6x better one-submission pass rate over CodeT, the current state-of-the-art model on CodeContests. We can also get 1.3x better pass rate over the ChatGPT Code Interpreter on unseen problems. The problem set we used for testing, the prompts we used, the verifier and solution programs, and the test cases generated by ALGO are available at https://github.com/zkx06111/ALGO.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2305.14591

Country: North America > United States > California (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

INSTRUCTSCORE: Explainable Text Generation Evaluation with Finegrained Feedback

Xu, Wenda, Wang, Danqing, Pan, Liangming, Song, Zhenqiao, Freitag, Markus, Wang, William Yang, Li, Lei

arXiv.org Artificial IntelligenceOct-26-2023

Automatically evaluating the quality of language generation is critical. Although recent learned metrics show high correlation with human judgement, these metrics can not explain their verdict or associate the scores with defects in generated text. To address this limitation, we present InstructScore, an explainable evaluation metric for text generation. By harnessing both explicit human instruction and the implicit knowledge of GPT-4, we fine-tune a text evaluation metric based on LLaMA, producing both a score for generated text and a human readable diagnostic report. We evaluate InstructScore on a variety of generation tasks, including translation, captioning, data-to-text and commonsense generation. Experiments show that our 7B model surpasses all other unsupervised metrics, including those based on 175B GPT-3 and GPT-4. Surprisingly, our InstructScore, even without direct supervision from human-rated data, achieves performance levels on par with state-of-the-art metrics like COMET22, which were fine-tuned on human ratings.

large language model, machine learning, translation, (18 more...)

arXiv.org Artificial Intelligence

2305.14282

Country:

Europe (1.00)
Asia > Middle East > UAE (0.14)
North America > United States > Pennsylvania (0.14)
(3 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Health Care Technology (1.00)
Education (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

Learning from Mistakes via Cooperative Study Assistant for Large Language Models

Wang, Danqing, Li, Lei

arXiv.org Artificial IntelligenceOct-24-2023

Large language models (LLMs) have demonstrated their potential to refine their generation based on their own feedback. However, the feedback from LLM itself is often inaccurate, thereby limiting its benefits. In this paper, we propose Study Assistant for Large LAnguage Model (SALAM), a novel framework with an auxiliary agent to assist the main LLM in learning from mistakes through interactive cooperation. In the gathering phase, the student assistant agent probes the main LLM, analyzes its errors, and collects the interaction in a mistake memory. During the examination phase, the study assistant provides guidelines by retrieving relevant cases to help the main LLM anticipate and avoid similar errors. We first investigate the effectiveness of a general study assistant and then customize it to provide LLM-specific guidance through imitation learning from successful guidance experiences. Our experiments on three LLMs using two challenging frameworks demonstrate that SALAM can significantly boost LLMs by an accuracy margin of up to 6.6 on BBH and 12.6 on BBQ.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.13829

Country:

Asia (0.28)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

End-to-end Story Plot Generator

Zhu, Hanlin, Cohen, Andrew, Wang, Danqing, Yang, Kevin, Yang, Xiaomeng, Jiao, Jiantao, Tian, Yuandong

arXiv.org Artificial IntelligenceOct-12-2023

Story plots, while short, carry most of the essential information of a full story that may contain tens of thousands of words. We study the problem of automatic generation of story plots, which includes story premise, character descriptions, plot outlines, etc. To generate a single engaging plot, existing plot generators (e.g., DOC (Yang et al., 2022a)) require hundreds to thousands of calls to LLMs (e.g., OpenAI API) in the planning stage of the story plot, which is costly and takes at least several minutes. Moreover, the hard-wired nature of the method makes the pipeline non-differentiable, blocking fast specialization and personalization of the plot generator. In this paper, we propose three models, $\texttt{OpenPlot}$, $\texttt{E2EPlot}$ and $\texttt{RLPlot}$, to address these challenges. $\texttt{OpenPlot}$ replaces expensive OpenAI API calls with LLaMA2 (Touvron et al., 2023) calls via careful prompt designs, which leads to inexpensive generation of high-quality training datasets of story plots. We then train an end-to-end story plot generator, $\texttt{E2EPlot}$, by supervised fine-tuning (SFT) using approximately 13000 story plots generated by $\texttt{OpenPlot}$. $\texttt{E2EPlot}$ generates story plots of comparable quality to $\texttt{OpenPlot}$, and is > 10$\times$ faster (1k tokens in only 30 seconds on average). Finally, we obtain $\texttt{RLPlot}$ that is further fine-tuned with RLHF on several different reward models for different aspects of story quality, which yields 60.0$\%$ winning rate against $\texttt{E2EPlot}$ along the aspect of suspense and surprise.

deep learning, end-to-end story plot generator, large language model, (3 more...)

arXiv.org Artificial Intelligence

2310.08796

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.44)

Add feedback

Learning Personalized Story Evaluation

Wang, Danqing, Yang, Kevin, Zhu, Hanlin, Yang, Xiaomeng, Cohen, Andrew, Li, Lei, Tian, Yuandong

arXiv.org Artificial IntelligenceOct-10-2023

While large language models (LLMs) have shown impressive results for more objective tasks such as QA and retrieval, it remains nontrivial to evaluate their performance on open-ended text generation for reasons including (1) data contamination; (2) multi-dimensional evaluation criteria; and (3) subjectiveness stemming from reviewers' personal preferences. To address such issues, we propose to model personalization in an uncontaminated open-ended generation assessment. We create two new datasets Per-MPST and Per-DOC for personalized story evaluation, by re-purposing existing datasets with proper anonymization and new personalized labels. SE to infer reviewer preferences and provide a personalized evaluation. SE predicts either a detailed review or fine-grained comparison in several aspects (such as interestingness and surprise) for that reviewer on a new text input. SE outperforms GPT-4 by 15.8% on Kendall correlation of story ratings, and by 13.7% on pairwise preference prediction accuracy. Both datasets and code will be released. LLMs' abilities in open-ended text generation are still insufficiently Meanwhile, some recent metrics propose to directly use strong LLMs as evaluators (Fu et al., 2023; Liu et al., Besides, the contamination problem may affect the evaluation performance, similar to other tasks (Chang et al., 2023). Human evaluation is also widely used in open-ended text generation. However, it may be timeconsuming and expensive, especially for larger-scale evaluation. This personalization issue in text generation has recently attracted increasing attention (Flek, 2020; Dudy et al., 2021), but personalization in evaluation is still under-explored. In this paper, we explore personalized evaluation for long-form story generation, where the assessment is heavily influenced by reviewers' personal preferences. For example, Figure 1 illustrates two reviewers' opinions when comparing two plots derived from the same premise. Reviewer 1 prefers Plot A for its uplifting ending while Reviewer 2 favors Plot B because of the plot complexity and empathetic ending. To model such diverse preferences in story evaluation, the major difficulty lies in the following aspects: personalization story evaluation dataset modeling, i.e., uncontaminated story datasets with personal information, and reviewer preference modeling, i.e., effective methods to capture reviewer preferences and evaluate stories from a particular individual's perspective. Few story evaluation datasets have personal labels due to the difficulty of collecting personal information. Besides, most existing story datasets have been exposed to LLMs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2310.03304

Country:

Europe (1.00)
North America > United States > Pennsylvania (0.28)

Genre: Research Report (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Information Technology (0.68)
Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback