AITopics | partial answer

Collaborating Authors

partial answer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AA-Omniscience: Evaluating Cross-Domain Knowledge Reliability in Large Language Models

Jackson, Declan, Keating, William, Cameron, George, Hill-Smith, Micah

arXiv.org Artificial IntelligenceNov-18-2025

We introduce AA-Omniscience, a benchmark designed to measure both factual recall and knowledge calibration across 6,000 questions. Questions are derived from authoritative academic and industry sources, and cover 42 economically relevant topics within six different domains. The evaluation measures a model's Omniscience Index, a bounded metric (-100 to 100) measuring factual recall that jointly penalizes hallucinations and rewards abstention when uncertain, with 0 equating to a model that answers questions correctly as much as it does incorrectly. Among evaluated models, Claude 4.1 Opus attains the highest score (4.8), making it one of only three models to score above zero. These results reveal persistent factuality and calibration weaknesses across frontier models. Performance also varies by domain, with the models from three different research labs leading across the six domains. This performance variability suggests models should be chosen according to the demands of the use case rather than general performance for tasks where knowledge is important.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.13029

Country:

North America > United States > California (0.28)
North America > United States > Arkansas (0.28)

Genre: Research Report (0.52)

Industry:

Law (0.93)
Health & Medicine (0.93)
Government > Regional Government > North America Government > United States Government (0.46)
Materials > Metals & Mining (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

StepChain GraphRAG: Reasoning Over Knowledge Graphs for Multi-Hop Question Answering

Ni, Tengjun, Yuan, Xin, Li, Shenghong, Wu, Kai, Liu, Ren Ping, Ni, Wei, Zhang, Wenjie

arXiv.org Artificial IntelligenceOct-6-2025

Recent progress in retrieval-augmented generation (RAG) has led to more accurate and interpretable multi-hop question answering (QA). Yet, challenges persist in integrating iterative reasoning steps with external knowledge retrieval. To address this, we introduce StepChain GraphRAG, a framework that unites question decomposition with a Breadth-First Search (BFS) Reasoning Flow for enhanced multi-hop QA. Our approach first builds a global index over the corpus; at inference time, only retrieved passages are parsed on-the-fly into a knowledge graph, and the complex query is split into sub-questions. For each sub-question, a BFS-based traversal dynamically expands along relevant edges, assembling explicit evidence chains without overwhelming the language model with superfluous context. Experiments on MuSiQue, 2WikiMultiHopQA, and HotpotQA show that StepChain GraphRAG achieves state-of-the-art Exact Match and F1 scores. StepChain GraphRAG lifts average EM by 2.57% and F1 by 2.13% over the SOTA method, achieving the largest gain on HotpotQA (+4.70% EM, +3.44% F1). StepChain GraphRAG also fosters enhanced explainability by preserving the chain-of-thought across intermediate retrieval steps. We conclude by discussing how future work can mitigate the computational overhead and address potential hallucinations from large language models to refine efficiency and reliability in multi-hop QA.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.02827

Country: Oceania > Australia (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Mind the Generation Process: Fine-Grained Confidence Estimation During LLM Generation

Han, Jinyi, Li, Tingyun, Chen, Shisong, Shi, Jie, Wang, Xinyi, Yue, Guanglei, Liang, Jiaqing, Lin, Xin, Wen, Liqian, Chen, Zulong, Xiao, Yanghua

arXiv.org Artificial IntelligenceAug-19-2025

While large language models (LLMs) have demonstrated remarkable performance across diverse tasks, they fundamentally lack self-awareness and frequently exhibit overconfidence, assigning high confidence scores to incorrect predictions. Accurate confidence estimation is therefore critical for enhancing the trustworthiness and reliability of LLM-generated outputs. However, existing approaches suffer from coarse-grained scoring mechanisms that fail to provide fine-grained, continuous confidence estimates throughout the generation process. To address these limitations, we introduce FineCE, a novel confidence estimation method that delivers accurate, fine-grained confidence scores during text generation. Specifically, we first develop a comprehensive pipeline for constructing training data that effectively captures the underlying probabilistic distribution of LLM responses, and then train a model to predict confidence scores for arbitrary text sequences in a supervised manner. Furthermore, we propose a Backward Confidence Integration (BCI) strategy that leverages information from the subsequent text to enhance confidence estimation for the current sequence during inference. We also introduce three strategies for identifying optimal positions to perform confidence estimation within the generation process. Extensive experiments on multiple benchmark datasets demonstrate that FineCE consistently outperforms existing classical confidence estimation methods. Our code and all baselines used in the paper are available on GitHub.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2508.1204

Country:

Asia (0.67)
North America > United States (0.28)

Genre: Research Report > Promising Solution (0.46)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Partial Answer of How Transformers Learn Automata

Zhang, Tiantian

arXiv.org Artificial IntelligenceMay-13-2025

We introduce a novel framework for simulating finite automata using representation-theoretic semidirect products and Fourier modules, achieving more efficient Transformer-based implementations.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2504.20395

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

ReAgent: Reversible Multi-Agent Reasoning for Knowledge-Enhanced Multi-Hop QA

Xinjie, Zhao, Gao, Fan, Yang, Rui, Chen, Yingjian, Wang, Yuyang, Zhu, Ying, Tang, Jiacheng, Li, Irene

arXiv.org Artificial IntelligenceMar-10-2025

Recent advances in large language models (LLMs) have significantly improved multi-hop question answering (QA) through direct Chain-of-Thought (CoT) reasoning. However, the irreversible nature of CoT leads to error accumulation, making it challenging to correct mistakes in multi-hop reasoning. This paper introduces ReAgent: a Reversible multi-Agent collaborative framework augmented with explicit backtracking mechanisms, enabling reversible multi-hop reasoning. By incorporating text-based retrieval, information aggregation and validation, our system can detect and correct errors mid-reasoning, leading to more robust and interpretable QA outcomes. The framework and experiments serve as a foundation for future work on error-tolerant QA systems. Empirical evaluations across three benchmarks indicate ReAgent's efficacy, yielding average about 6\% improvements against baseline models.

agent, california, conflict, (14 more...)

arXiv.org Artificial Intelligence

2503.06951

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.32)
North America > United States > California > Sacramento County > Sacramento (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(8 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.67)
Leisure & Entertainment > Sports > Olympic Games (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DebateQA: Evaluating Question Answering on Debatable Knowledge

Xu, Rongwu, Qi, Xuan, Qi, Zehan, Xu, Wei, Guo, Zhijiang

arXiv.org Artificial IntelligenceAug-2-2024

The rise of large language models (LLMs) has enabled us to seek answers to inherently debatable questions on LLM chatbots, necessitating a reliable way to evaluate their ability. However, traditional QA benchmarks assume fixed answers are inadequate for this purpose. To address this, we introduce DebateQA, a dataset of 2,941 debatable questions, each accompanied by multiple human-annotated partial answers that capture a variety of perspectives. We develop two metrics: Perspective Diversity, which evaluates the comprehensiveness of perspectives, and Dispute Awareness, which assesses if the LLM acknowledges the question's debatable nature. Experiments demonstrate that both metrics align with human preferences and are stable across different underlying models. Using DebateQA with two metrics, we assess 12 popular LLMs and retrieval-augmented generation methods. Our findings reveal that while LLMs generally excel at recognizing debatable issues, their ability to provide comprehensive answers encompassing diverse perspectives varies considerably.

computational linguistic, debatable question, partial answer, (16 more...)

arXiv.org Artificial Intelligence

2408.01419

Country:

Asia > Singapore (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Canada > Ontario > Toronto (0.04)
(15 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Education (1.00)
Health & Medicine (0.67)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

DiscASP: A Graph-based ASP System for Finding Relevant Consistent Concepts with Applications to Conversational Socialbots

Li, Fang, Wang, Huaduo, Basu, Kinjal, Salazar, Elmer, Gupta, Gopal

arXiv.org Artificial IntelligenceSep-16-2021

We consider the problem of finding relevant consistent concepts in a conversational AI system, particularly, for realizing a conversational socialbot. Commonsense knowledge about various topics can be represented as an answer set program. However, to advance the conversation, we need to solve the problem of finding relevant consistent concepts, i.e., find consistent knowledge in the "neighborhood" of the current topic being discussed that can be used to advance the conversation. Traditional ASP solvers will generate the whole answer set which is stripped of all the associations between the various atoms (concepts) and thus cannot be used to find relevant consistent concepts. Similarly, goal-directed implementations of ASP will only find concepts directly relevant to a query. We present the DiscASP system that will find the partial consistent model that is relevant to a given topic in a manner similar to how a human will find it. DiscASP is based on a novel graph-based algorithm for finding stable models of an answer set program. We present the DiscASP algorithm, its implementation, and its application to developing a conversational socialbot.

constraint, dependency graph, movie, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.4204/EPTCS.345.35

2109.08297

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Texas (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(3 more...)

Add feedback

Evaluating Conversational Characters Created through Question Generation

Chen, Grace (California State University Long Beach) | Tosch, Emma (Brandeis University) | Artstein, Ron (USC Institute for Creative Technologies) | Leuski, Anton ( USC Institute for Creative Technologies ) | Traum, David ( USC Institute for Creative Technologies )

AAAI ConferencesMay-18-2011

Question generation tools can be used to extract a question-answer database from text articles. We investigate how suitable this technique is for giving domain-specific knowledge to conversational characters. We tested these characters by collecting questions and answers from naive participants, running the questions through the character, and comparing the system responses to the participant answers. Characters gave a full or partial answer to 53% of the user questions which had an answer available in the source text, and 43% of all questions asked. Performance was better for questions asked after the user had read the source text, and also varied by question type: the best results were answers to who questions, while answers to yes/no questions were among the poorer performers. The results show that question generation is a promising method for creating a question answering conversational character from an existing text.

participant, question type, source text, (14 more...)

AAAI Conferences

Twenty-Fourth International FLAIRS Conference

Country:

North America > United States > California > San Diego County > Vista (0.05)
North America > United States > California > Los Angeles County > Beverly Hills (0.05)

Genre: Research Report > New Finding (0.72)

Industry: Government (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.83)

Add feedback