AITopics | factoid

Collaborating Authors

factoid

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Retrieval Quality at Context Limit

McKinnon, Max

arXiv.org Artificial IntelligenceNov-11-2025

Abstract--The ability of large language models (LLMs) to recall and retrieve information from long contexts is critical for many real-world applications. Prior work (Liu et al., 2023) reported that LLMs suffer significant drops in retrieval accuracy for facts placed in the middle of large contexts, an effect known as "Lost in the Middle" (LITM). We find the model Gemini 2.5 Flash can answer needle-in-a-haystack questions with great accuracy regardless of document position including when the document is nearly at the input context limit. Our results suggest that the "Lost in the Middle" effect is not present for simple factoid Q&A in Gemini 2.5 Flash, indicating substantial improvements in long-context retrieval. Large language models (LLMs) have rapidly advanced in their ability to process and reason over long textual contexts, enabling applications in summarization, retrieval-augmented Q&A, document understanding, and more.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.0585

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MetaRAG: Metamorphic Testing for Hallucination Detection in RAG Systems

Sok, Channdeth, Luz, David, Haddam, Yacine

arXiv.org Artificial IntelligenceNov-10-2025

Large Language Models (LLMs) are increasingly deployed in enterprise applications, yet their reliability remains limited by hallucinations, i.e., confident but factually incorrect information. Existing detection approaches, such as SelfCheckGPT and MetaQA, primarily target standalone LLMs and do not address the unique challenges of Retrieval-Augmented Generation (RAG) systems, where responses must be consistent with retrieved evidence. We therefore present MetaRAG, a metamorphic testing framework for hallucination detection in Retrieval-Augmented Generation (RAG) systems. MetaRAG operates in a real-time, unsupervised, black-box setting, requiring neither ground-truth references nor access to model internals, making it suitable for proprietary and high-stakes domains. The framework proceeds in four stages: (1) decompose answers into atomic factoids, (2) generate controlled mutations of each factoid using synonym and antonym substitutions, (3) verify each variant against the retrieved context (synonyms are expected to be entailed and antonyms contradicted), and (4) aggregate penalties for inconsistencies into a response-level hallucination score. Crucially for identity-aware AI, MetaRAG localizes unsupported claims at the factoid span where they occur (e.g., pregnancy-specific precautions, LGBTQ+ refugee rights, or labor eligibility), allowing users to see flagged spans and enabling system designers to configure thresholds and guardrails for identity-sensitive queries. Experiments on a proprietary enterprise dataset illustrate the effectiveness of MetaRAG for detecting hallucinations and enabling trustworthy deployment of RAG-based conversational agents. We also outline a topic-based deployment design that translates MetaRAG's span-level scores into identity-aware safeguards; this design is discussed but not evaluated in our experiments.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2509.0936

Country:

Europe (0.94)
North America > United States (0.93)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Harnessing Collective Intelligence of LLMs for Robust Biomedical QA: A Multi-Model Approach

Panou, Dimitra, Dimopoulos, Alexandros C., Koubarakis, Manolis, Reczko, Martin

arXiv.org Artificial IntelligenceAug-5-2025

Biomedical text mining and question-answering are essential yet highly demanding tasks, particularly in the face of the exponential growth of biomedical literature. In this work, we present our participation in the 13th edition of the BioASQ challenge, which involves biomedical semantic question-answering for Task 13b and biomedical question-answering for developing topics for the Synergy task. We deploy a selection of open-source large language models (LLMs) as retrieval-augmented generators to answer biomedical questions. Various models are used to process the questions. A majority voting system combines their output to determine the final answer for Yes/No questions, while for list and factoid type questions, the union of their answers in used. We evaluated 13 state-of-the-art open source LLMs, exploring all possible model combinations to contribute to the final answer, resulting in tailored LLM pipelines for each question type. Our findings provide valuable insight into which combinations of LLMs consistently produce superior results for specific question types. In the four rounds of the 2025 BioASQ challenge, our system achieved notable results: in the Synergy task, we secured 1st place for ideal answers and 2nd place for exact answers in round 2, as well as two shared 1st places for exact answers in round 3 and 4.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.0148

Country: Europe > Greece (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Continual Memorization of Factoids in Large Language Models

Chen, Howard, Geng, Jiayi, Bhaskar, Adithya, Friedman, Dan, Chen, Danqi

arXiv.org Artificial IntelligenceNov-11-2024

Large language models can absorb a massive amount of knowledge through pretraining, but pretraining is inefficient for acquiring long-tailed or specialized facts. Therefore, fine-tuning on specialized or new knowledge that reflects changes in the world has become popular, though it risks disrupting the model's original capabilities. We study this fragility in the context of continual memorization, where the model is trained on a small set of long-tail factoids (factual associations) and must retain these factoids after multiple stages of subsequent training on other datasets. Through extensive experiments, we show that LLMs suffer from forgetting across a wide range of subsequent tasks, and simple replay techniques do not fully prevent forgetting, especially when the factoid datasets are trained in the later stages. We posit that there are two ways to alleviate forgetting: 1) protect the memorization process as the model learns the factoids, or 2) reduce interference from training in later stages. With this insight, we develop an effective mitigation strategy: REMIX (Random and Generic Data Mixing). REMIX prevents forgetting by mixing generic data sampled from pretraining corpora or even randomly generated word sequences during each stage, despite being unrelated to the memorized factoids in the first stage. REMIX can recover performance from severe forgetting, often outperforming replay-based methods that have access to the factoids from the first stage. We then analyze how REMIX alters the learning process and find that successful forgetting prevention is associated with a pattern: the model stores factoids in earlier layers than usual and diversifies the set of layers that store these factoids. The efficacy of REMIX invites further investigation into the underlying dynamics of memorization and forgetting, opening exciting possibilities for future research.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.07175

Country:

North America > United States > Kansas (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report (0.82)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

ECon: On the Detection and Resolution of Evidence Conflicts

Jiayang, Cheng, Chan, Chunkit, Zhuang, Qianqian, Qiu, Lin, Zhang, Tianhang, Liu, Tengxiao, Song, Yangqiu, Zhang, Yue, Liu, Pengfei, Zhang, Zheng

arXiv.org Artificial IntelligenceOct-5-2024

The rise of large language models (LLMs) has significantly influenced the quality of information in decision-making systems, leading to the prevalence of AI-generated content and challenges in detecting misinformation and managing conflicting information, or "inter-evidence conflicts." This study introduces a method for generating diverse, validated evidence conflicts to simulate real-world misinformation scenarios. We evaluate conflict detection methods, including Natural Language Inference (NLI) models, factual consistency (FC) models, and LLMs, on these conflicts (RQ1) and analyze LLMs' conflict resolution behaviors (RQ2). Our key findings include: (1) NLI and LLM models exhibit high precision in detecting answer conflicts, though weaker models suffer from low recall; (2) FC models struggle with lexically similar answer conflicts, while NLI and LLM models handle these better; and (3) stronger models like GPT-4 show robust performance, especially with nuanced conflicts. For conflict resolution, LLMs often favor one piece of conflicting evidence without justification and rely on internal knowledge if they have prior beliefs.

claude 3, conflict, nli-xxlarge, (17 more...)

arXiv.org Artificial Intelligence

2410.04068

Country:

Europe > United Kingdom (0.14)
Asia > Middle East > UAE > Dubai Emirate > Dubai (0.04)
North America > United States > Nebraska (0.04)
(11 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Government > Regional Government (0.93)
Government > Immigration & Customs (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Calibrated Language Models Must Hallucinate

Kalai, Adam Tauman, Vempala, Santosh S.

arXiv.org Artificial IntelligenceDec-3-2023

Recent language models generate false but plausible-sounding text with surprising frequency. Such "hallucinations" are an obstacle to the usability of language-based AI systems and can harm people who rely upon their outputs. This work shows shows that there is an inherent statistical lower-bound on the rate that pretrained language models hallucinate certain types of facts, having nothing to do with the transformer LM architecture or data quality. For "arbitrary" facts whose veracity cannot be determined from the training data, we show that hallucinations must occur at a certain rate for language models that satisfy a statistical calibration condition appropriate for generative language models. Specifically, if the maximum probability of any fact is bounded, we show that the probability of generating a hallucination is close to the fraction of facts that occur exactly once in the training data (a "Good-Turing" estimate), even assuming ideal training data without errors. One conclusion is that models pretrained to be sufficiently good predictors (i.e., calibrated) may require post-training to mitigate hallucinations on the type of arbitrary facts that tend to appear once in the training set. However, our analysis also suggests that there is no statistical reason that pretraining will lead to hallucination on facts that tend to appear more than once in the training data (like references to publications such as articles and books, whose hallucinations have been particularly notable and problematic) or on systematic facts (like arithmetic calculations). Therefore, different architectures and learning algorithms may mitigate these latter types of hallucinations.

factoid, hallucination, training data, (15 more...)

arXiv.org Artificial Intelligence

2311.14648

Country:

North America > United States (0.14)
South America > Colombia > Meta Department > Villavicencio (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Topic-Aware Response Generation in Task-Oriented Dialogue with Unstructured Knowledge Access

Feng, Yue, Lampouras, Gerasimos, Iacobacci, Ignacio

arXiv.org Artificial IntelligenceDec-10-2022

To alleviate the problem of structured databases' limited coverage, recent task-oriented dialogue systems incorporate external unstructured knowledge to guide the generation of system responses. However, these usually use word or sentence level similarities to detect the relevant knowledge context, which only partially capture the topical level relevance. In this paper, we examine how to better integrate topical information in knowledge grounded task-oriented dialogue and propose ``Topic-Aware Response Generation'' (TARG), an end-to-end response generation model. TARG incorporates multiple topic-aware attention mechanisms to derive the importance weighting scheme over dialogue utterances and external knowledge sources towards a better understanding of the dialogue history. Experimental results indicate that TARG achieves state-of-the-art performance in knowledge selection and response generation, outperforming previous state-of-the-art by 3.2, 3.6, and 4.2 points in EM, F1 and BLEU-4 respectively on Doc2Dial, and performing comparably with previous work on DSTC9; both being knowledge-grounded task-oriented dialogue datasets.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2212.05373

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.70)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Powerful Idea About Our Brains Stormed Pop Culture and Captured Minds. It's Mostly Bunk.

SlateNov-28-2022

When Leonardo DiCaprio's relationship with model/actress Camila Morrone ended three months after she celebrated her 25th birthday, the lifestyle site YourTango turned to neuroscience. DiCaprio has a well-documented history of dating women under 25. "Given that DiCaprio's cut-off point is exactly around the time that neuroscientists say our brains are finished developing, there is certainly a case to be made that a desire to date younger partners comes from a desire to have control," the article said. It quotes a couples therapist, who says that at 25, people's "brains are fully formed and that presents a more elevated and conscious level of connection"--the type of connection, YourTango suggests, that DiCaprio wants to avoid. YourTango was parroting a factoid that's gained a chokehold over pop science in the past decade: that 25 marks the age at which our brains become "fully developed" or "mature." This assertion has been used as an explanation for a vast range of phenomena.

adulthood, brain, neuroscience, (13 more...)

Slate

Country:

North America > United States > Virginia (0.04)
North America > United States > Texas > Uvalde County > Uvalde (0.04)
North America > United States > Oregon (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Education (1.00)

Technology: Information Technology > Artificial Intelligence > Cognitive Science (0.40)

Add feedback

From factoids to facts

AITopics Original LinksJan-18-2017, 11:21:15 GMT

WHAT is the next stage in the evolution of internet search engines? AltaVista demonstrated that indexing the entire world wide web was feasible. Google's success stems from its uncanny ability to sort useful web pages from dross. But the real prize will surely go to whoever can use the web to deliver a straight answer to a straight question. And Eric Brill, a researcher at Microsoft, intends that his firm will be the first to do that.

artificial intelligence, dr brill, natural language, (7 more...)

AITopics Original Links

Country:

North America > United States > California (0.16)
Europe > United Kingdom > England > Greater London > London > Wimbledon (0.05)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.97)

Add feedback

Machines Deciding What We See On-line: How AI Is Altering The Net

#artificialintelligenceMay-14-2016, 05:30:22 GMT

The Washington Put up wrote earlier this week on Google's growing use of information packing containers in its searches – the inset bins on the prime of search outcomes that try and shortcut the search course of by displaying the precise factoid of curiosity on high of the standard infinite web page of hyperlinks. As customers more and more entry the net by cellular and voice, the aim of such programs is to get the person a solution comparable to "what number of ounces in a pound" or "who's the president of Estonia" as shortly as potential. Whereas serps of the psat merely returned a pile of hyperlinks for a consumer to wade via, the objective of data containers is to offer the precise response the consumer is on the lookout for by leveraging advances in pure language processing to have machines really perceive the person's query. In keeping with the Publish these factoids at the moment are displayed for nearly a 3rd of Google's 100 billion month-to-month searches, which means they're enjoying an ever-increasing position in mediating our entry to the world's data. The rise of bots throughout the communicative continuum from office instruments like Slack to social communication like Fb means machine interpretation of the world's information will more and more supplant the historic idea of the key phrase search.

artificial intelligence, information, natural language, (13 more...)

#artificialintelligence

Country: Europe > Estonia (0.56)

Industry: Government > Regional Government (0.36)

Technology:

Information Technology > Communications > Web (0.36)
Information Technology > Artificial Intelligence > Natural Language (0.36)

Add feedback