AITopics | Guo, Yue

Collaborating Authors

Guo, Yue

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PlainQAFact: Automatic Factuality Evaluation Metric for Biomedical Plain Language Summaries Generation

You, Zhiwen, Guo, Yue

arXiv.org Artificial IntelligenceMar-11-2025

Hallucinated outputs from language models pose risks in the medical domain, especially for lay audiences making health-related decisions. Existing factuality evaluation methods, such as entailment- and question-answering-based (QA), struggle with plain language summary (PLS) generation due to elaborative explanation phenomenon, which introduces external content (e.g., definitions, background, examples) absent from the source document to enhance comprehension. To address this, we introduce PlainQAFact, a framework trained on a fine-grained, human-annotated dataset PlainFact, to evaluate the factuality of both source-simplified and elaboratively explained sentences. PlainQAFact first classifies factuality type and then assesses factuality using a retrieval-augmented QA-based scoring method. Our approach is lightweight and computationally efficient. Empirical results show that existing factuality metrics fail to effectively evaluate factuality in PLS, especially for elaborative explanations, whereas PlainQAFact achieves state-of-the-art performance. We further analyze its effectiveness across external knowledge sources, answer extraction strategies, overlap measures, and document granularity levels, refining its overall factuality assessment.

large language model, machine learning, question answering, (22 more...)

arXiv.org Artificial Intelligence

2503.0889

Country: North America > United States > Illinois (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
(2 more...)

Add feedback

Speaking the Language of Teamwork: LLM-Guided Credit Assignment in Multi-Agent Reinforcement Learning

Lin, Muhan, Shi, Shuyang, Guo, Yue, Tadiparthi, Vaishnav, Chalaki, Behdad, Pari, Ehsan Moradi, Stepputtis, Simon, Kim, Woojun, Campbell, Joseph, Sycara, Katia

arXiv.org Artificial IntelligenceFeb-5-2025

Credit assignment, the process of attributing credit or blame to individual agents for their contributions to a team's success or failure, remains a fundamental challenge in multi-agent reinforcement learning (MARL), particularly in environments with sparse rewards. Commonly-used approaches such as value decomposition often lead to suboptimal policies in these settings, and designing dense reward functions that align with human intuition can be complex and labor-intensive. In this work, we propose a novel framework where a large language model (LLM) generates dense, agent-specific rewards based on a natural language description of the task and the overall team goal. By learning a potential-based reward function over multiple queries, our method reduces the impact of ranking errors while allowing the LLM to evaluate each agent's contribution to the overall task. Through extensive experiments, we demonstrate that our approach achieves faster convergence and higher policy returns compared to state-of-the-art MARL baselines.

large language model, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2502.03723

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Navigating Noisy Feedback: Enhancing Reinforcement Learning with Error-Prone Language Models

Lin, Muhan, Shi, Shuyang, Guo, Yue, Chalaki, Behdad, Tadiparthi, Vaishnav, Pari, Ehsan Moradi, Stepputtis, Simon, Campbell, Joseph, Sycara, Katia

arXiv.org Artificial IntelligenceOct-22-2024

The correct specification of reward models is a well-known challenge in reinforcement learning. Hand-crafted reward functions often lead to inefficient or suboptimal policies and may not be aligned with user values. Reinforcement learning from human feedback is a successful technique that can mitigate such issues, however, the collection of human feedback can be laborious. Recent works have solicited feedback from pre-trained large language models rather than humans to reduce or eliminate human effort, however, these approaches yield poor performance in the presence of hallucination and other errors. This paper studies the advantages and limitations of reinforcement learning from large language model feedback and proposes a simple yet effective method for soliciting and applying feedback as a potential-based shaping function. We theoretically show that inconsistent rankings, which approximate ranking errors, lead to uninformative rewards with our approach. Our method empirically improves convergence speed and policy returns over commonly used baselines even with significant ranking errors, and eliminates the need for complex post-processing of reward functions.

large language model, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2410.17389

Country: North America > United States (0.68)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

EconNLI: Evaluating Large Language Models on Economics Reasoning

Guo, Yue, Yang, Yi

arXiv.org Artificial IntelligenceJul-1-2024

Large Language Models (LLMs) are widely used for writing economic analysis reports or providing financial advice, but their ability to understand economic knowledge and reason about potential results of specific economic events lacks systematic evaluation. To address this gap, we propose a new dataset, natural language inference on economic events (EconNLI), to evaluate LLMs' knowledge and reasoning abilities in the economic domain. We evaluate LLMs on (1) their ability to correctly classify whether a premise event will cause a hypothesis event and (2) their ability to generate reasonable events resulting from a given premise. Our experiments reveal that LLMs are not sophisticated in economic reasoning and may generate wrong or hallucinated answers. Our study raises awareness of the limitations of using LLMs for critical decision-making involving economic reasoning and analysis. The dataset and codes are available at https://github.com/Irenehere/EconNLI.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2407.01212

Country:

Europe (0.68)
Asia > Middle East > UAE (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Banking & Finance > Economy (1.00)
Energy > Oil & Gas (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

Improving Weak-to-Strong Generalization with Reliability-Aware Alignment

Guo, Yue, Yang, Yi

arXiv.org Artificial IntelligenceJun-27-2024

Large language models (LLMs) are now rapidly advancing and surpassing human abilities on many natural language tasks. However, aligning these super-human LLMs with human knowledge remains challenging because the supervision signals from human annotators may be wrong. This issue, known as the "super-alignment" problem, requires enhancing weak-to-strong generalization, where a strong LLM must generalize from imperfect supervision provided by a weaker source. To address this issue, we propose an approach to improve weak-to-strong generalization by involving the reliability of weak supervision signals in the alignment process. In our method, we query the weak supervisor for multiple answers, estimate the answer reliability, and enhance the alignment process by filtering out uncertain data or re-weighting reliable data. Experiments on four datasets demonstrate that our methods effectively identify the quality of weak labels and significantly enhance weak-to-strong generalization. Our work presents effective techniques for error-robust model alignment, reducing error propagation from noisy supervision and enhancing the accuracy and reliability of LLMs. Codes are publicly available at http://github.com/Irenehere/ReliableAlignment.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2406.19032

Country:

Asia (0.46)
Africa > Rwanda (0.14)
North America > United States (0.14)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Advancing Abductive Reasoning in Knowledge Graphs through Complex Logical Hypothesis Generation

Bai, Jiaxin, Wang, Yicheng, Zheng, Tianshi, Guo, Yue, Liu, Xin, Song, Yangqiu

arXiv.org Artificial IntelligenceJan-25-2024

Abductive reasoning is the process of making educated guesses to provide explanations for observations. Although many applications require the use of knowledge for explanations, the utilization of abductive reasoning in conjunction with structured knowledge, such as a knowledge graph, remains largely unexplored. To fill this gap, this paper introduces the task of complex logical hypothesis generation, as an initial step towards abductive logical reasoning with KG. In this task, we aim to generate a complex logical hypothesis so that it can explain a set of observations. We find that the supervised trained generative model can generate logical hypotheses that are structurally closer to the reference hypothesis. However, when generalized to unseen observations, this training objective does not guarantee better hypothesis generation. To address this, we introduce the Reinforcement Learning from Knowledge Graph (RLF-KG) method, which minimizes differences between observations and conclusions drawn from generated hypotheses according to the KG. Experiments show that, with RLF-KG's assistance, the generated hypotheses provide better explanations, and achieve state-of-the-art results on three widely used KGs.

abductive reasoning, artificial intelligence, hypothesis, (16 more...)

arXiv.org Artificial Intelligence

2312.15643

Country:

Asia (1.00)
Europe (0.67)
North America > United States > Michigan > Ingham County (0.14)
(2 more...)

Genre:

Research Report (0.50)
Instructional Material (0.48)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area (0.94)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Abductive Reasoning (0.82)

Add feedback

Understanding Your Agent: Leveraging Large Language Models for Behavior Explanation

Zhang, Xijia, Guo, Yue, Stepputtis, Simon, Sycara, Katia, Campbell, Joseph

arXiv.org Artificial IntelligenceNov-29-2023

Intelligent agents such as robots are increasingly deployed in real-world, safety-critical settings. It is vital that these agents are able to explain the reasoning behind their decisions to human counterparts; however, their behavior is often produced by uninterpretable models such as deep neural networks. We propose an approach to generate natural language explanations for an agent's behavior based only on observations of states and actions, thus making our method independent from the underlying model's representation. For such models, we first learn a behavior representation and subsequently use it to produce plausible explanations with minimal hallucination while affording user interaction with a pre-trained large language model. We evaluate our method in a multi-agent search-and-rescue environment and demonstrate the effectiveness of our explanations for agents executing various behaviors. Through user studies and empirical experiments, we show that our approach generates explanations as helpful as those produced by a human domain expert while enabling beneficial interactions such as clarification and counterfactual queries.

explanation, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2311.18062

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Personalized Jargon Identification for Enhanced Interdisciplinary Communication

Guo, Yue, Chang, Joseph Chee, Antoniak, Maria, Bransom, Erin, Cohen, Trevor, Wang, Lucy Lu, August, Tal

arXiv.org Artificial IntelligenceNov-15-2023

Scientific jargon can impede researchers when they read materials from other domains. Current methods of jargon identification mainly use corpus-level familiarity indicators (e.g., Simple Wikipedia represents plain language). However, researchers' familiarity of a term can vary greatly based on their own background. We collect a dataset of over 10K term familiarity annotations from 11 computer science researchers for terms drawn from 100 paper abstracts. Analysis of this data reveals that jargon familiarity and information needs vary widely across annotators, even within the same sub-domain (e.g., NLP). We investigate features representing individual, sub-domain, and domain knowledge to predict individual jargon familiarity. We compare supervised and prompt-based approaches, finding that prompt-based methods including personal publications yields the highest accuracy, though zero-shot prompting provides a strong baseline. This research offers insight into features and methods to integrate personal data into scientific jargon identification.

familiarity, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2311.09481

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area (0.69)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications (0.88)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Predict the Future from the Past? On the Temporal Data Distribution Shift in Financial Sentiment Classifications

Guo, Yue, Hu, Chenxi, Yang, Yi

arXiv.org Artificial IntelligenceOct-19-2023

Temporal data distribution shift is prevalent in the financial text. How can a financial sentiment analysis system be trained in a volatile market environment that can accurately infer sentiment and be robust to temporal data distribution shifts? In this paper, we conduct an empirical study on the financial sentiment analysis system under temporal data distribution shifts using a real-world financial social media dataset that spans three years. We find that the fine-tuned models suffer from general performance degradation in the presence of temporal distribution shifts. Furthermore, motivated by the unique temporal nature of the financial text, we propose a novel method that combines out-of-distribution detection with time series modeling for temporal financial sentiment analysis. Experimental results show that the proposed method enhances the model's capability to adapt to evolving temporal shifts in a volatile financial market.

distribution shift, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2310.1262

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.88)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Is ChatGPT a Financial Expert? Evaluating Language Models on Financial Natural Language Processing

Guo, Yue, Xu, Zian, Yang, Yi

arXiv.org Artificial IntelligenceOct-19-2023

The emergence of Large Language Models (LLMs), such as ChatGPT, has revolutionized general natural language preprocessing (NLP) tasks. However, their expertise in the financial domain lacks a comprehensive evaluation. To assess the ability of LLMs to solve financial NLP tasks, we present FinLMEval, a framework for Financial Language Model Evaluation, comprising nine datasets designed to evaluate the performance of language models. This study compares the performance of encoder-only language models and the decoder-only language models. Our findings reveal that while some decoder-only LLMs demonstrate notable performance across most financial tasks via zero-shot prompting, they generally lag behind the fine-tuned expert models, especially when dealing with proprietary datasets. We hope this study provides foundation evaluations for continuing efforts to build more advanced LLMs in the financial domain.

artificial intelligence, financial natural language processing, large language model, (2 more...)

arXiv.org Artificial Intelligence

2310.12664

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)

Add feedback