dir
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Virginia (0.04)
- Asia > China > Ningxia Hui Autonomous Region > Yinchuan (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- North America > United States > California > Riverside County > Riverside (0.04)
- Europe > United Kingdom > England (0.04)
- Europe > Portugal > Coimbra > Coimbra (0.04)
- (2 more...)
- North America > Canada > Ontario > Toronto (0.14)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Virginia (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.68)
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (0.67)
SeSE: A Structural Information-Guided Uncertainty Quantification Framework for Hallucination Detection in LLMs
Zhao, Xingtao, Peng, Hao, Su, Dingli, Zeng, Xianghua, Liu, Chunyang, Liao, Jinzhi, Yu, Philip S.
Reliable uncertainty quantification (UQ) is essential for deploying large language models (LLMs) in safety-critical scenarios, as it enables them to abstain from responding when uncertain, thereby avoiding ``hallucinating'' falsehoods. However, state-of-the-art UQ methods primarily rely on semantic probability distributions or pairwise distances, overlooking latent semantic structural information that could enable more precise uncertainty estimates. This paper presents Semantic Structural Entropy (SeSE), a principled UQ framework that quantifies the inherent semantic uncertainty of LLMs from a structural information perspective for hallucination detection. SeSE operates in a zero-resource manner and is applicable to both open- and closed-source LLMs, making it an ``off-the-shelf" solution for new models and tasks. Specifically, to effectively model semantic spaces, we first develop an adaptively sparsified directed semantic graph construction algorithm that captures directional semantic dependencies while automatically pruning unnecessary connections that introduce negative interference. We then exploit latent semantic structural information through hierarchical abstraction: SeSE is defined as the structural entropy of the optimal semantic encoding tree, formalizing intrinsic uncertainty within semantic spaces after optimal compression. A higher SeSE value corresponds to greater uncertainty, indicating that LLMs are highly likely to generate hallucinations. In addition, to enhance fine-grained UQ in long-form generation, we extend SeSE to quantify the uncertainty of individual claims by modeling their random semantic interactions, providing theoretically explicable hallucination detection. Extensive experiments across 29 model-dataset combinations show that SeSE significantly outperforms advanced UQ baselines.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.92)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > Florida > Alachua County > Gainesville (0.14)
- North America > Greenland (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Overview (0.67)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.92)
Sample-Efficient Omniprediction for Proper Losses
Gibbs, Isaac, Tibshirani, Ryan J.
We consider the problem of constructing probabilistic predictions that lead to accurate decisions when employed by downstream users to inform actions. For a single decision maker, designing an optimal predictor is equivalent to minimizing a proper loss function corresponding to the negative utility of that individual. For multiple decision makers, our problem can be viewed as a variant of omniprediction in which the goal is to design a single predictor that simultaneously minimizes multiple losses. Existing algorithms for achieving omniprediction broadly fall into two categories: 1) boosting methods that optimize other auxiliary targets such as multicalibration and obtain omniprediction as a corollary, and 2) adversarial two-player game based approaches that estimate and respond to the ``worst-case" loss in an online fashion. We give lower bounds demonstrating that multicalibration is a strictly more difficult problem than omniprediction and thus the former approach must incur suboptimal sample complexity. For the latter approach, we discuss how these ideas can be used to obtain a sample-efficient algorithm through an online-to-batch conversion. This conversion has the downside of returning a complex, randomized predictor. We improve on this method by designing a more direct, unrandomized algorithm that exploits structural elements of the set of proper losses.
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Germany (0.04)
- (5 more...)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Virginia (0.04)
- Asia > China > Ningxia Hui Autonomous Region > Yinchuan (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > Florida > Alachua County > Gainesville (0.14)
- North America > Greenland (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Overview (0.67)