Goto

Collaborating Authors

 Goel, Mayank


Epistemic Integrity in Large Language Models

arXiv.org Artificial Intelligence

Large language models are increasingly relied upon as sources of information, but their propensity for generating false or misleading statements with high confidence poses risks for users and society. In this paper, we confront the critical problem of epistemic miscalibration -- where a model's linguistic assertiveness fails to reflect its true internal certainty. We introduce a new human-labeled dataset and a novel method for measuring the linguistic assertiveness of Large Language Models (LLMs) which cuts error rates by over 50% relative to previous benchmarks. Validated across multiple datasets, our method reveals a stark misalignment between how confidently models linguistically present information and their actual accuracy. Further human evaluations confirm the severity of this miscalibration. This evidence underscores the urgent risk of the overstated certainty LLMs hold which may mislead users on a massive scale. Our framework provides a crucial step forward in diagnosing this miscalibration, offering a path towards correcting it and more trustworthy AI across domains. Large Language Models (LLMs) have markedly transformed how humans seek and consume information, becoming integral across diverse fields such as public health (Ali et al., 2023), coding (Zambrano et al., 2023), and education (Whalen & et al., 2023). Despite their growing influence, LLMs are not without shortcomings. One notable issue is the potential for generating responses that, while convincing, may be inaccurate or nonsensical--a long-standing phenomenon often referred to as "hallucinations" (Jo, 2023; Huang et al., 2023; Zhou et al., 2024b). This raises concerns about the reliability and trustworthiness of these models. A critical aspect of trustworthiness in LLMs is epistemic calibration, which represents the alignment between a model's internal confidence in its outputs and the way it expresses that confidence through natural language. Misalignment between internal certainty and external expression can lead to users being misled by overconfident or underconfident statements, posing significant risks in high-stakes domains such as legal advice, medical diagnosis, and misinformation detection. While of great normative concern, how LLMs express linguistic uncertainty has received relatively little attention to date (Sileo & Moens, 2023; Belem et al., 2024). Figures 1 and 5 illustrate the issue of epistemic calibration providing insights into the operation of certainty in the context of human interactions with LLMs. Distinct Roles of Certainty: Internal certainty and linguistic assertiveness have distinct functions within LLM interactions that shape individual beliefs. Human access to LLM certainty: Linguistic assertiveness holds a critical role as the primary form of certainty available to users.


SymTax: Symbiotic Relationship and Taxonomy Fusion for Effective Citation Recommendation

arXiv.org Artificial Intelligence

Citing pertinent literature is pivotal to writing and reviewing a scientific document. Existing techniques mainly focus on the local context or the global context for recommending citations but fail to consider the actual human citation behaviour. We propose SymTax, a three-stage recommendation architecture that considers both the local and the global context, and additionally the taxonomical representations of query-candidate tuples and the Symbiosis prevailing amongst them. SymTax learns to embed the infused taxonomies in the hyperbolic space and uses hyperbolic separation as a latent feature to compute query-candidate similarity. We build a novel and large dataset ArSyTa containing 8.27 million citation contexts and describe the creation process in detail. We conduct extensive experiments and ablation studies to demonstrate the effectiveness and design choice of each module in our framework. Also, combinatorial analysis from our experiments shed light on the choice of language models (LMs) and fusion embedding, and the inclusion of section heading as a signal. Our proposed module that captures the symbiotic relationship solely leads to performance gains of 26.66% and 39.25% in Recall@5 w.r.t. SOTA on ACL-200 and RefSeer datasets, respectively. The complete framework yields a gain of 22.56% in Recall@5 wrt SOTA on our proposed dataset. The code and dataset are available at https://github.com/goyalkaraniit/SymTax


MWPRanker: An Expression Similarity Based Math Word Problem Retriever

arXiv.org Artificial Intelligence

Math Word Problems (MWPs) in online assessments help test the ability of the learner to make critical inferences by interpreting the linguistic information in them. To test the mathematical reasoning capabilities of the learners, sometimes the problem is rephrased or the thematic setting of the original MWP is changed. Since manual identification of MWPs with similar problem models is cumbersome, we propose a tool in this work for MWP retrieval. We propose a hybrid approach to retrieve similar MWPs with the same problem model. In our work, the problem model refers to the sequence of operations to be performed to arrive at the solution. We demonstrate that our tool is useful for the mentioned tasks and better than semantic similarity-based approaches, which fail to capture the arithmetic and logical sequence of the MWPs. A demo of the tool can be found at https://www.youtube.com/watch?v=gSQWP3chFIs