hallucination
- North America > United States > Virginia (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)
- Asia > Middle East > Jordan (0.04)
- Law (1.00)
- Health & Medicine (0.93)
- Government > Regional Government > North America Government > United States Government (0.47)
- Asia > Middle East > Jordan (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (5 more...)
- Asia > Middle East > Jordan (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (5 more...)
'Deepfakes spreading and more AI companions': seven takeaways from the latest artificial intelligence safety report
The international AI safety report warns systems are improving rapidly - but remain prone to'hallucinations' and hard to control. The international AI safety report warns systems are improving rapidly - but remain prone to'hallucinations' and hard to control. The International AI Safety report is an annual survey of technological progress and the risks it is creating across multiple areas, from deepfakes to the jobs market. Commissioned at the 2023 global AI safety summit, it is chaired by the Canadian computer scientist Yoshua Bengio, who describes the "daunting challenges" posed by rapid developments in the field. The report is also guided by senior advisers, including Nobel laureates Geoffrey Hinton and Daron Acemoglu.
- North America > United States (0.30)
- Europe > Ukraine (0.06)
- Oceania > Australia (0.05)
- (4 more...)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- Health & Medicine (0.98)
- Leisure & Entertainment > Sports (0.70)
The Math on AI Agents Doesn't Add Up
The Math on AI Agents Doesn't Add Up A research paper suggests AI agents are mathematically doomed to fail. The big AI companies promised us that 2025 would be "the year of the AI agents." It turned out to be the year of AI agents, and kicking the can for that transformational moment to 2026 or maybe later. But what if the answer to the question "When will our lives be fully automated by generative AI robots that perform our tasks for us and basically run the world?" is, like that New Yorker cartoon, "How about never?" That was basically the message of a paper published without much fanfare some months ago, smack in the middle of the overhyped year of "agentic AI." Entitled " Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models," it purports to mathematically show that "LLMs are incapable of carrying out computational and agentic tasks beyond a certain complexity."
- North America > United States > New York (0.25)
- North America > United States > California (0.14)
- Europe > Slovakia (0.05)
- Europe > Czechia (0.05)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.38)
A Unified Definition of Hallucination, Or: It's the World Model, Stupid
Liu, Emmy, Gangal, Varun, Zou, Chelsea, Huang, Xiaoqi, Yu, Michael, Chang, Alex, Tao, Zhuofu, Kumar, Sachin, Feng, Steven Y.
Despite numerous attempts to solve the issue of hallucination since the inception of neural language models, it remains a problem in even frontier large language models today. Why is this the case? We walk through definitions of hallucination used in the literature from a historical perspective up to the current day, and fold them into a single definition of hallucination, wherein different prior definitions focus on different aspects of our definition. At its core, we argue that hallucination is simply inaccurate (internal) world modeling, in a form where it is observable to the user (e.g., stating a fact which contradicts a knowledge base, or producing a summary which contradicts a known source). By varying the reference world model as well as the knowledge conflict policy (e.g., knowledge base vs. in-context), we arrive at the different existing definitions of hallucination present in the literature. We argue that this unified view is useful because it forces evaluations to make clear their assumed "world" or source of truth, clarifies what should and should not be called hallucination (as opposed to planning or reward/incentive-related errors), and provides a common language to compare benchmarks and mitigation techniques. Building on this definition, we outline plans for a family of benchmarks in which hallucinations are defined as mismatches with synthetic but fully specified world models in different environments, and sketch out how these benchmarks can use such settings to stress-test and improve the world modeling components of language models.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Ohio (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (4 more...)
- Research Report (0.50)
- Personal > Honors (0.47)
Detecting Bugs with Substantial Monetary Consequences by LLM and Rule-based Reasoning
Financial transactions are increasingly being handled by automated programs called . However, one challenge in the adaptation of smart contracts is the presence of vulnerabilities, which can cause significant monetary loss.In 2024, $247.88 M was lost in 20 smart contract exploits.According to a recent study, accounting bugs (i.e., incorrect implementations of domain-specific financial models) are the most prevalent type of vulnerability, and are one of the most difficult to find, requiring substantial human efforts.While Large Language Models (LLMs) have shown promise in identifying these bugs, they often suffer from lack of generalization of vulnerability types, hallucinations, and problems with representing smart contracts in limited token context space.This paper proposes a hybrid system combining LLMs and rule-based reasoning to detect accounting error vulnerabilities in smart contracts. In particular, it utilizes the understanding capabilities of LLMs to annotate the financial meaning of variables in smart contracts, and employs rule-based reasoning to propagate the information throughout a contract's logic and to validate potential vulnerabilities.To remedy hallucinations, we propose a feedback loop where validation is performed by providing the reasoning trace of vulnerabilities to the LLM for iterative self-reflection. We achieve 75.6% accuracy on the labelling of financial meanings against human annotations. Furthermore, we achieve a recall of 90.5% from running on 23 real-world smart contract projects containing 21 accounting error vulnerabilities.Finally, we apply the automated technique on 8 recent projects, finding 4 known and 2 unknown bugs.
LLMDFA: Analyzing Dataflow in Code with Large Language Models
Dataflow analysis is a fundamental code analysis technique that identifies dependencies between program values. Traditional approaches typically necessitate successful compilation and expert customization, hindering their applicability and usability for analyzing uncompilable programs with evolving analysis needs in real-world scenarios. This paper presents LLMDFA, an LLM-powered compilation-free and customizable dataflow analysis framework. To address hallucinations for reliable results, we decompose the problem into several subtasks and introduce a series of novel strategies. Specifically, we leverage LLMs to synthesize code that outsources delicate reasoning to external expert tools, such as using a parsing library to extract program values of interest and invoking an automated theorem prover to validate path feasibility. Additionally, we adopt a few-shot chain-of-thought prompting to summarize dataflow facts in individual functions, aligning the LLMs with the program semantics of small code snippets to mitigate hallucinations. We evaluate LLMDFA on synthetic programs to detect three representative types of bugs and on real-world Android applications for customized bug detection. On average, LLMDFA achieves 87.10% precision and 80.77% recall, surpassing existing techniques with F1 score improvements of up to 0.35.
Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization
Although Large Visual Language Models (LVLMs) have demonstrated exceptional abilities in understanding multimodal data, they invariably suffer from hallucinations, leading to a disconnection between the generated text and the corresponding images. Almost all current visual contrastive decoding methods attempt to mitigate these hallucinations by introducing visual uncertainty information that appropriately widens the contrastive logits gap between hallucinatory and targeted ones. However, due to uncontrollable nature of the global visual uncertainty, they struggle to precisely induce the hallucinatory tokens, which severely limits their effectiveness in mitigating hallucinations and may even lead to the generation of undesired hallucinations. To tackle this issue, we conducted the theoretical analysis to promote the effectiveness of contrast decoding. Building on this insight, we introduce a novel optimization strategy named Hallucination-Induced Optimization (HIO). This strategy seeks to amplify the contrast between hallucinatory and targeted tokens relying on a fine-tuned theoretical preference model (i.e., Contrary Bradley-Terry Model), thereby facilitating efficient contrast decoding to alleviate hallucinations in LVLMs. Extensive experimental research demonstrates that our HIO strategy can effectively reduce hallucinations in LVLMs, outperforming state-of-the-art methods across various benchmarks.