AITopics

2605.27016

Country: Europe > France (0.28)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

arXiv.org Machine LearningMay-20-2026

HalluWorld: A Controlled Benchmark for Hallucination via Reference World Models

Liu, Emmy, Gangal, Varun, Yu, Michael, Tao, Zhuofu, Singh, Karan, Kumar, Sachin, Feng, Steven Y.

Hallucination remains a central failure mode of large language models, but existing benchmarks operationalize it inconsistently across tasks such as summarization, question answering, retrieval-augmented generation, and agentic interaction. This fragmentation makes it unclear whether a mitigation that works in one setting actually reduces hallucinations across contexts. Current hallucination benchmarks either require human annotation and fixed references that may eventually be memorized, or rely on naturalistic observations often recorded in settings that are difficult to reproduce or test systematically. To enable further research on the root causes of hallucination, we introduce HALLUWORLD, an extensible benchmark framework grounded in an explicit reference-world formulation: a model hallucinates when it produces an observable claim that is false with respect to this reference world. Building on this view, we construct a family of synthetic and semi-synthetic benchmark environments in which the reference world is fully specified, the model's observable view is controlled, and hallucination labels can be generated automatically by construction. HALLUWORLD spans multiple settings that are classically representative for AI, i.e., gridworlds, chess, and realistic terminal tasks. This enables controlled variation of key factors such as world complexity, observability, temporal change, and source-conflict policy, allowing us to disentangle hallucinations into more fine-grained error categories. We evaluate frontier and open-weight language models across these settings and find consistent patterns across domains: perceptual hallucination on directly observed information is near-solved for frontier models, while multi-step state tracking and causal forward simulation are still difficult for frontier models, and are not generally solved by extended thinking.

large language model, machine learning, natural language, (22 more...)

2605.19341

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Chess (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.89)

Iagaru, David, Gottschling, Nina M., Hansen, Anders C., Garnier, Josselin

On Hallucinations in Inverse Problems: Fundamental Limits and Provable Assessment Methods

arXiv.org Machine LearningMay-14-2026

While deep learning has revolutionised inverse problems, its safe deployment is hindered by three primary reliability concerns: hallucinations, instabilities, and performance volatility [48]. Hallucinations manifest as high-fidelity features that are factually false; instabilities reflect heightened sensitivity to measurement noise; and performance volatility refers to significant fluctuations in reconstruction quality across the data, yielding high-fidelity results for some samples while failing on seemingly similar images. In many applications, the risk of generating realistic but unfaithful content can impede the safe deployment of AI methods for inverse problems. The choice of "hallucinate" as the Cambridge Dictionary's word of the year in 2023 illustrates this open problem [53]. The problem of AI hallucinations persists, as the Financial Times [44] highlighted that, "AI hallucinations haunt users more than job losses." A first step toward training AI methods that do not suffer from hallucinations is the assessment and identification of hallucinated outputs. Consider the inverse problem of recovering xfrom noisy measurements y " Fpx,eq, x PM1 ĂX, e PEĂY, (1.1)

artificial intelligence, machine learning, reconstruction, (18 more...)

2605.13146

Country:

North America > United States (1.00)
Europe (0.67)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Energy (1.00)
(2 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsApr-30-2026, 16:09:35 GMT

LLMDFA: Analyzing Dataflow in Code with Large Language Models

Dataflow analysis is a fundamental code analysis technique that identifies dependencies between program values. Traditional approaches typically necessitate successful compilation and expert customization, hindering their applicability and usability for analyzing uncompilable programs with evolving analysis needs in real-world scenarios. This paper presents LLMDFA, an LLM-powered compilation-free and customizable dataflow analysis framework. To address hallucinations for reliable results, we decompose the problem into several subtasks and introduce a series of novel strategies. Specifically, we leverage LLMs to synthesize code that outsources delicate reasoning to external expert tools, such as using a parsing library to extract program values of interest and invoking an automated theorem prover to validate path feasibility. Additionally, we adopt a few-shot chain-of-thought prompting to summarize dataflow facts in individual functions, aligning the LLMs with the program semantics of small code snippets to mitigate hallucinations. We evaluate LLMDFA on synthetic programs to detect three representative types of bugs and on real-world Android applications for customized bug detection. On average, LLMDFA achieves 87.10% precision and 80.77% recall, surpassing existing techniques with F1 score improvements of up to 0.35.

artificial intelligence, large language model, natural language, (6 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.86)

Neural Information Processing SystemsApr-30-2026, 11:01:27 GMT

FLAME : Factuality-Aware Alignment for Large Language Models

Alignment is a procedure to fine-tune pre-trained large language models (LLMs) to follow natural language instructions and serve as helpful AI assistants. We have observed, however, that the conventional alignment process fails to enhance the factual accuracy of LLMs, and often leads to the generation of more false facts (i.e.,). In this paper, we study how to make the LLM alignment process more factual, by first identifying factors that lead to hallucination in both alignment steps: supervised fine-tuning (SFT) and reinforcement learning (RL).In particular, we find that training the LLM on new or unfamiliar knowledge can encourage hallucination.This makes SFT less factual as it trains on human-labeled data that may be novel to the LLM. Furthermore, reward functions used in standard RL often inadequately capture factuality and favor longer and more detailed responses, which inadvertently promote hallucination.Based on these observations, we propose, comprised of and through direct preference optimization. Experiments show that our proposed guides LLMs to output more factual responses while maintaining their instruction-following capability.

artificial intelligence, large language model, natural language, (6 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

WIREDApr-28-2026, 08:30:00 GMT

The Bloomberg Terminal Is Getting an AI Makeover, Like It or Not

WIRED spoke with Bloomberg's chief technology officer about the big, chatbot-style changes coming to the iconic platform for traders. For its famous intractability, the Bloomberg Terminal has long inspired devotion, bordering on obsession . Among traders, the ability to chart a path through the software's dizzying scrolls of numbers and text to isolate far-flung information is the mark of a seasoned professional. But as a greater mass of data is fed into the Terminal--not only earnings and asset prices, but weather forecasts, shipping logs, factory locations, consumer spending patterns, private loans, and so on--valuable information is being lost. "It has become more and more untenable," says Shawn Edwards, chief technology officer at Bloomberg.

artificial intelligence, machine learning, natural language, (15 more...)

WIRED

Country:

Europe (0.29)
North America > United States > California (0.15)

Industry: Banking & Finance > Economy (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Neural Information Processing SystemsApr-24-2026, 23:32:11 GMT

108030643e640ac050e0ed5e6aace48f-Paper-Conference.pdf

large language model, machine learning, natural language, (18 more...)

Country:

North America > United States (0.46)
Asia > China (0.28)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.45)

Industry:

Health & Medicine (0.46)
Education (0.46)
Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Tian, Xinyu, Shen, Xiaotong

Generative Score Inference for Multimodal Data

arXiv.org Machine LearningMar-30-2026

Accurate uncertainty quantification is crucial for making reliable decisions in various supervised learning scenarios, particularly when dealing with complex, multimodal data such as images and text. Current approaches often face notable limitations, including rigid assumptions and limited generalizability, constraining their effectiveness across diverse supervised learning tasks. To overcome these limitations, we introduce Generative Score Inference (GSI), a flexible inference framework capable of constructing statistically valid and informative prediction and confidence sets across a wide range of multimodal learning problems. GSI utilizes synthetic samples generated by deep generative models to approximate conditional score distributions, facilitating precise uncertainty quantification without imposing restrictive assumptions about the data or tasks. We empirically validate GSI's capabilities through two representative scenarios: hallucination detection in large language models and uncertainty estimation in image captioning. Our method achieves state-of-the-art performance in hallucination detection and robust predictive uncertainty in image captioning, and its performance is positively influenced by the quality of the underlying generative model. These findings underscore the potential of GSI as a versatile inference framework, significantly enhancing uncertainty quantification and trustworthiness in multimodal learning.

artificial intelligence, machine learning, natural language, (19 more...)

2603.26349

Country:

North America > United States > Minnesota (0.04)
North America > United States > Michigan > Wayne County > Detroit (0.04)
North America > United States > Michigan > Genesee County > Flint (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (0.66)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Neural Information Processing SystemsMar-22-2026, 20:45:27 GMT

Understanding Hallucinations in Diffusion Models through Mode Interpolation

Colloquially speaking, image generation models based upon diffusion processes are frequently said to exhibit ''hallucinations'' samples that could never occur in the training data. But where do such hallucinations come from?

artificial intelligence, hallucination, machine learning, (8 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)