AITopics | factuality

Collaborating Authors

factuality

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Benchmarking Foundation Models with Language-Model-as-an-Examiner Y ushi Bai

Neural Information Processing SystemsFeb-18-2026, 00:20:50 GMT

Our data and benchmarking results are available at: http://lmexam.xlore.cn.

examiner, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > China > Beijing > Beijing (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Public Health (0.97)
Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

df438caa36714f69277daa92d608dd63-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 09:31:42 GMT

arxiv preprint arxiv, factuality, knowledge, (13 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Illinois (0.04)
(3 more...)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Add feedback

TheUnreliabilityofExplanationsinFew-shot PromptingforTextualReasoning

Neural Information Processing SystemsFeb-11-2026, 19:26:33 GMT

However, text-davinci-002 is able to benefit more substantially. We further show that explanations generated by the LLMs may not entail the models' predictions norbefactually grounded intheinput, evenonsimple tasks with extractive explanations. However, these flawed explanations can still be useful as a way to verify LLMs' predictions post-hoc.

explanation, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Louisiana (0.04)
North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)

Add feedback

e4d2b6e6fdeca3e60e0f1a62fee3d9dd-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 15:36:12 GMT

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
Asia > China (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
(12 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

39c5871aa13be86ab978cba7069cbcec-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 08:56:28 GMT

large language model, machine learning, uncertainty estimation, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
(3 more...)

Add feedback

1e89c12621c0315373f20f0aeabe5dbe-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-9-2026, 04:01:29 GMT

Therearetwoupdatingstrategies: 1) mimicking strategy to generate similar samples based on original data, preserving stylistic and contextual essence, and 2) extending strategy that further expands existing samples at varying cognitive levels by adapting Bloom's taxonomy ofeducational objectives. Extensiveexperiments onupdated MMLU andBIG-Bench demonstrate thestability oftheproposed strategiesandfindthat the mimicking strategy can effectively alleviate issues of overestimation from benchmark leakage. In cases where the efficient mimicking strategy fails, our extending strategystill showspromising results.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Asia > China (0.04)
North America > United States > Colorado > Weld County > Evans (0.04)
North America > Canada > Ontario > Toronto (0.04)
(2 more...)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

BARTScore: Evaluating Generated Text as Text Generation

Neural Information Processing SystemsDec-25-2025, 02:51:55 GMT

A wide variety of NLP applications, such as machine translation, summarization, and dialog, involve text generation. One major challenge for these applications is how to evaluate whether such generated texts are actually fluent, accurate, or effective. In this work, we conceptualize the evaluation of generated text as a text generation problem, modeled using pre-trained sequence-to-sequence models. The general idea is that models trained to convert the generated text to/from a reference output or the source text will achieve higher scores when the generated text is better. We operationalize this idea using BART, an encoder-decoder based pre-trained model, and propose a metric BARTScore with a number of variants that can be flexibly applied in an unsupervised fashion to evaluation of text from different perspectives (e.g.

bartscore, generated text, name change, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Graph-based Uncertainty Metrics for Long-form Language Model Generations

Neural Information Processing SystemsDec-25-2025, 01:40:25 GMT

Recent advancements in Large Language Models (LLMs) have significantly improved text generation capabilities, but these systems are still known to hallucinate, and granular uncertainty estimation for long-form LLM generations remains challenging. In this work, we propose Graph Uncertainty -- which represents the relationship between LLM generations and claims within them as a bipartite graph and estimates the claim-level uncertainty with a family of graph centrality metrics. Under this view, existing uncertainty estimation methods based on the concept of self-consistency can be viewed as using degree centrality as an uncertainty measure, and we show that more sophisticated alternatives such as closeness centrality provide consistent gains at claim-level uncertainty estimation.Moreover, we present uncertainty-aware decoding techniques that leverage both the graph structure and uncertainty estimates to improve the factuality of LLM generations by preserving only the most reliable claims. Compared to existing methods, our graph-based uncertainty metrics lead to an average of 6.8% relative gains on AUPRC across various long-form generation settings, and our end-to-end system provides consistent 2-4% gains in factuality over existing decoding techniques while significantly improving the informativeness of generated responses.

artificial intelligence, large language model, natural language, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

EduMod-LLM: A Modular Approach for Designing Flexible and Transparent Educational Assistants

Mittal, Meenakshi, Khare, Rishi, Miroyan, Mihran, Mitra, Chancharik, Norouzi, Narges

arXiv.org Artificial IntelligenceDec-1-2025

With the growing use of Large Language Model (LLM)- based Question-Answering (QA) systems in education, it is critical to evaluate their performance across individual pipeline components. In this work, we introduce EduMod-LLM, a modular function-calling LLM pipeline, and present a comprehensive evaluation along three key axes: function calling strategies, retrieval methods, and generative language models. Our framework enables fine-grained analysis by isolating and assessing each component. We benchmark function-calling performance across LLMs, compare our novel structure-aware retrieval method to vector-based and LLM-scoring baselines, and evaluate various LLMs for response synthesis. This modular approach reveals specific failure modes and performance patterns, supporting the development of interpretable and effective educational QA systems. Our findings demonstrate the value of modular function calling in improving system transparency and pedagogical alignment.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.21742

Country: North America > United States > California (0.14)

Genre:

Research Report > New Finding (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Information Technology (0.93)
Education > Curriculum (0.68)
Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Factuality-Aware Alignment for Large Language Models

Neural Information Processing SystemsNov-20-2025, 04:31:34 GMT

This makes SFT less factual as it trains on human-labeled data that may be novel to the LLM. Furthermore, reward functions used in standard RL often inadequately capture factuality and favor longer and more detailed responses, which inadvertently promote hallucination.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: