AITopics | nonsense

Collaborating Authors

nonsense

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Trustworthy Retrosynthesis: Eliminating Hallucinations with a Diverse Ensemble of Reaction Scorers

Sadowski, Michal, Radusinović, Tadija, Wyrzykowska, Maria, Sztukiewicz, Lukasz, Rzymkowski, Jan, Włodarczyk-Pruszyński, Paweł, Sacha, Mikołaj, Kozakowski, Piotr, van Workum, Ruard, Jastrzebski, Stanislaw Kamil

arXiv.org Artificial IntelligenceDec-9-2025

Retrosynthesis is one of the domains transformed by the rise of generative models, and it is one where the problem of nonsensical or erroneous outputs (hallucinations) is particularly insidious: reliable assessment of synthetic plans is time-consuming, with automatic methods lacking. In this work, we present RetroTrim, a retrosynthesis system that successfully avoids nonsensical plans on a set of challenging drug-like targets. Compared to common baselines in the field, our system is not only the sole method that succeeds in filtering out hallucinated reactions, but it also results in the highest number of high-quality paths overall. The key insight behind RetroTrim is the combination of diverse reaction scoring strategies, based on machine learning models and existing chemical databases. We show that our scoring strategies capture different classes of hallucinations by analyzing them on a dataset of labeled retrosynthetic intermediates. This approach formed the basis of our winning solution to the Standard Industries \$1 million Retrosynthesis Challenge. To measure the performance of retrosynthesis systems, we propose a novel evaluation protocol for reactions and synthetic paths based on a structured review by expert chemists. Using this protocol, we compare systems on a set of 32 novel targets, curated to reflect recent trends in drug structures. While the insights behind our methodology are broadly applicable to retrosynthesis, our focus is on targets in the drug-like domain. By releasing our benchmark targets and the details of our evaluation protocol, we hope to inspire further research into reliable retrosynthesis.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.10645

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Sequences of Logits Reveal the Low Rank Structure of Language Models

Golowich, Noah, Liu, Allen, Shetty, Abhishek

arXiv.org Machine LearningOct-30-2025

A major problem in the study of large language models is to understand their inherent low-dimensional structure. We introduce an approach to study the low-dimensional structure of language models at a model-agnostic level: as sequential probabilistic models. We first empirically demonstrate that a wide range of modern language models exhibit low-rank structure: in particular, matrices built from the model's logits for varying sets of prompts and responses have low approximate rank. We then show that this low-rank structure can be leveraged for generation -- in particular, we can generate a response to a target prompt using a linear combination of the model's outputs on unrelated, or even nonsensical prompts. On the theoretical front, we observe that studying the approximate rank of language models in the sense discussed above yields a simple universal abstraction whose theoretical predictions parallel our experiments. We then analyze the representation power of the abstraction and give provable learning guarantees.

large language model, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2510.24966

Country:

Europe (1.00)
North America > United States (0.93)

Genre: Research Report (0.82)

Industry: Government > Military (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.45)

Add feedback

Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth

Wang, Yang, Xiao, Chenghao, Hsiao, Chia-Yi, Chang, Zi Yan, Chen, Chi-Li, Loakman, Tyler, Lin, Chenghua

arXiv.org Artificial IntelligenceOct-17-2025

We introduce Drivelology, a unique linguistic phenomenon characterised as "nonsense with depth" - utterances that are syntactically coherent yet pragmatically paradoxical, emotionally loaded, or rhetorically subversive. While such expressions may resemble surface-level nonsense, they encode implicit meaning requiring contextual inference, moral reasoning, or emotional interpretation. We find that current large language models (LLMs), despite excelling at many natural language processing (NLP) tasks, consistently fail to grasp the layered semantics of Drivelological text. To investigate this, we construct a benchmark dataset of over 1,200+ meticulously curated and diverse examples across English, Mandarin, Spanish, French, Japanese, and Korean. Each example underwent careful expert review to verify its Drivelological characteristics, involving multiple rounds of discussion and adjudication to address disagreements. Using this dataset, we evaluate a range of LLMs on classification, generation, and reasoning tasks. Our results reveal clear limitations of LLMs: models often confuse Drivelology with shallow nonsense, produce incoherent justifications, or miss implied rhetorical functions altogether. These findings highlight a deep representational gap in LLMs' pragmatic understanding and challenge the assumption that statistical fluency implies cognitive comprehension. We release our dataset and code to facilitate further research in modelling linguistic depth beyond surface-level coherence.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.03867

Country:

North America (0.67)
Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Five things you need to know about AI right now

MIT Technology ReviewJul-22-2025, 09:50:27 GMT

The video is now available (thank you, SXSW London). Below is a quick look at my top five. Let me know if you would have picked different ones! Maybe you think that's obvious. But I am constantly having to check my assumptions about how fast this technology is progressing--and it's my job to keep up.

deep learning, hallucination, natural language, (4 more...)

MIT Technology Review

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.39)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.39)

Add feedback

The Two Word Test: A Semantic Benchmark for Large Language Models

Riccardi, Nicholas, Desai, Rutvik H.

arXiv.org Artificial IntelligenceJun-7-2023

Large Language Models (LLMs) have shown remarkable abilities recently, including passing advanced professional exams and demanding benchmark tests. This performance has led many to suggest that they are close to achieving humanlike or 'true' understanding of language, and even Artificial General Intelligence (AGI). Here, we provide a new open-source benchmark that can assess semantic abilities of LLMs using two-word phrases using a task that can be performed relatively easily by humans without advanced training. Combining multiple words into a single concept is a fundamental aspect of human language and intelligence. The test requires meaningfulness judgments of 1768 noun-noun combinations that have been rated as meaningful (e.g., baby boy) or not meaningful (e.g., goat sky). by 150 human raters. We provide versions of the task that probe meaningfulness ratings on a 0-4 scale as well as binary judgments. We conducted a series of experiments using the TWT on GPT-4, GPT-3.5, and Bard, with both versions. Results demonstrated that, compared to humans, all models perform poorly at rating meaningfulness of these phrases. GPT-3.5 and Bard are also unable to make binary discriminations between sensible and nonsense phrases as making sense. GPT-4 makes a substantial improvement in binary discrimination of combinatorial phrases but is still significantly worse than human performance. The TWT can be used to understand the limitations and weaknesses of current LLMs, and potentially improve them. The test also reminds us that caution is warranted in attributing 'true understanding' or AGI to LLMs. TWT is available at: https://github.com/NickRiccardi/two-word-test

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2306.0461

Country:

North America > United States > South Carolina (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > Minnesota (0.04)

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.46)

Industry: Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Google improves Bard to compete with ChatGPT: here's what's new

#artificialintelligenceApr-16-2023, 13:51:34 GMT

Google has recently improved its AI chatbot, Bard, in an effort to rival its competitor, ChatGPT. The tech giant has optimized the AI responses in some areas and made improvements to the chatbot's abilities in mathematics and logic. The first feedback on Bard was not positive, with testers criticizing the many restrictions put in place by Google. In response, the company padlocked the experience to avoid abuses. To address the limitations of Bard, Google has pledged to make improvements to its artificial intelligence.

bard, chatbot, google, (16 more...)

#artificialintelligence

Country:

North America > United States (0.17)
Europe > United Kingdom (0.06)

Industry: Information Technology (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

GPT-4 Has the Memory of a Goldfish

The Atlantic - TechnologyMar-17-2023, 17:13:25 GMT

By this point, the many defects of AI-based language models have been analyzed to death--their incorrigible dishonesty, their capacity for bias and bigotry, their lack of common sense. GPT-4, the newest and most advanced such model yet, is already being subjected to the same scrutiny, and it still seems to misfire in pretty much all the ways earlier models did. But large language models have another shortcoming that has so far gotten relatively little attention: their shoddy recall. These multibillion-dollar programs, which require several city blocks' worth of energy to run, may now be able to code websites, plan vacations, and draft company-wide emails in the style of William Faulkner. But they have the memory of a goldfish.

context window, gpt-4, language model, (10 more...)

The Atlantic - Technology

Country: North America > United States > Texas > Travis County > Austin (0.05)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AutoReply: Detecting Nonsense in Dialogue Introspectively with Discriminative Replies

Shi, Weiyan, Dinan, Emily, Renduchintala, Adi, Fried, Daniel, Jacob, Athul Paul, Yu, Zhou, Lewis, Mike

arXiv.org Artificial IntelligenceNov-22-2022

Existing approaches built separate classifiers to detect nonsense in dialogues. In this paper, we show that without external classifiers, dialogue models can detect errors in their own messages introspectively, by calculating the likelihood of replies that are indicative of poor messages. For example, if an agent believes its partner is likely to respond "I don't understand" to a candidate message, that message may not make sense, so an alternative message should be chosen. We evaluate our approach on a dataset from the game Diplomacy, which contains long dialogues richly grounded in the game state, on which existing models make many errors. We first show that hand-crafted replies can be effective for the task of detecting nonsense in applications as complex as Diplomacy. We then design AutoReply, an algorithm to search for such discriminative replies automatically, given a small number of annotated dialogue examples. We find that AutoReply-generated replies outperform handcrafted replies and perform on par with carefully fine-tuned large supervised models. Results also show that one single reply without much computation overheads can also detect dialogue nonsense reasonably well.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2211.12615

Country:

Asia > Middle East > Republic of Türkiye (0.05)
Europe > Spain (0.04)
Europe > Russia (0.04)
(16 more...)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)

Add feedback

Productizing Large Language Models

#artificialintelligenceNov-21-2022, 03:00:03 GMT

Large Language Models (LLMs) are known for their near-magical ability to learn from very few examples -- as little as zero -- to create language wonders. LLMs can chat, write poetry, write code, and even do basic arithmetic. However, the same properties that make LLMs magical also make them challenging from an engineering perspective. At Replit we have deployed transformer-based language models of all sizes: 100m parameter models for search and spam, 1-10B models for a code autocomplete product we call GhostWriter, and 100B models for features that require a higher reasoning ability. In this post we'll talk about what we've learned about building and hosting large language models.

ghostwriter, language model, llm, (13 more...)

#artificialintelligence

Country: North America > United States (0.05)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

How Would an AI Chatbot Handle the Complexities of Oral Language?

#artificialintelligenceOct-27-2022, 09:45:18 GMT

Joseph Wilson, a linguist and journalist who has done considerable work with oral languages (languages not yet written down), offers some thoughts on claims that chatbots like Blake Lemoine's LaMDA, really speak like human persons. But this excludes all unwritten forms of communication: sign language, oral histories, body language, tone of voice, and the broader cultural context in which people find themselves speaking. In other words, it leaves out much of the interesting stuff that makes nuanced communication between people possible. We really don't know how old spoken language is (Wilson suggests 50,000 years) but written language can be traced only as far back as about 5400 years ago. And only about half of all languages (he estimates 7100 currently) have ever been written down.

ai chatbot handle, communication, oral language, (4 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback