AITopics | flub

When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models

Neural Information Processing SystemsMar-22-2026, 12:44:29 GMT

Recently, Large Language Models (LLMs) make remarkable evolutions in language understanding and generation. Following this, various benchmarks for measuring all kinds of capabilities of LLMs have sprung up. In this paper, we challenge the reasoning and understanding abilities of LLMs by proposing a FaLlacy Understanding Benchmark (FLUB) containing cunning texts that are easy for humans to understand but difficult for models to grasp. Specifically, the cunning texts that FLUB focuses on mainly consist of the tricky, humorous, and misleading texts collected from the real internet environment. And we design three tasks with increasing difficulty in the FLUB benchmark to evaluate the fallacy understanding ability of LLMs. Based on FLUB, we investigate the performance of multiple representative and advanced LLMs, reflecting our FLUB is challenging and worthy of more future study. Interesting discoveries and valuable insights are achieved in our extensive experiments and detailed analyses. We hope that our benchmark can encourage the community to improve LLMs' ability to understand fallacies. Our data and codes are available at https://github.com/THUKElab/FLUB.

artificial intelligence, large language model, natural language, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

cbfbf1a9adbcc29783475d2767f218e8-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-18-2026, 04:41:20 GMT

artificial intelligence, dataset, natural language, (17 more...)

Neural Information Processing Systems

Country: Asia > China > Guangdong Province > Shenzhen (0.04)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area (0.94)
Information Technology (0.93)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.69)
Information Technology > Data Science (0.68)

Add feedback

When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models Yinghui Li

Neural Information Processing SystemsFeb-18-2026, 04:41:16 GMT

Inspired by the above motivation, we collect real cunning texts as our raw data from a famous Chinese online forum, the "Ruozhiba" (retard forum)

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Guangdong Province > Guangzhou (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
(12 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Law (1.00)
Information Technology (1.00)
Health & Medicine > Therapeutic Area (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

A Our Designed Prompts for FLUB

Neural Information Processing SystemsOct-10-2025, 16:48:37 GMT

Figure 4: Our designed prompts without the Chain-of-Thought idea. Task 3(b) is for inquiries. Figure 5: Our designed prompts with the Chain-of-Thought idea. Task 3(b) is for inquiries. Thought prompts for Task 1 and Task 2 are presented in Figure 5. Scoring Objective For the LLMs' output response to each input cunning text, please refer to the Scoring Rules The scoring values are defined as {1, 2, 3, 4, 5}.

dataset, flub, please provide, (15 more...)

Neural Information Processing Systems

Country: Asia > China > Guangdong Province > Shenzhen (0.04)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area (0.94)
Information Technology (0.93)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.35)

Add feedback

When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models Yinghui Li

Neural Information Processing SystemsOct-10-2025, 16:48:33 GMT

Inspired by the above motivation, we collect real cunning texts as our raw data from a famous Chinese online forum, the "Ruozhiba" (retard forum)

cunning text, flub, llm, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Guangdong Province > Guangzhou (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
(12 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Law (1.00)
Information Technology (1.00)
Health & Medicine > Therapeutic Area (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models

Neural Information Processing SystemsMay-27-2025, 16:43:04 GMT

Recently, Large Language Models (LLMs) make remarkable evolutions in language understanding and generation. Following this, various benchmarks for measuring all kinds of capabilities of LLMs have sprung up. In this paper, we challenge the reasoning and understanding abilities of LLMs by proposing a FaLlacy Understanding Benchmark (FLUB) containing cunning texts that are easy for humans to understand but difficult for models to grasp. Specifically, the cunning texts that FLUB focuses on mainly consist of the tricky, humorous, and misleading texts collected from the real internet environment. And we design three tasks with increasing difficulty in the FLUB benchmark to evaluate the fallacy understanding ability of LLMs.

artificial intelligence, large language model, natural language, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models

Li, Yinghui, Zhou, Qingyu, Luo, Yuanzhen, Ma, Shirong, Li, Yangning, Zheng, Hai-Tao, Hu, Xuming, Yu, Philip S.

arXiv.org Artificial IntelligenceJun-9-2024

Recently, Large Language Models (LLMs) make remarkable evolutions in language understanding and generation. Following this, various benchmarks for measuring all kinds of capabilities of LLMs have sprung up. In this paper, we challenge the reasoning and understanding abilities of LLMs by proposing a FaLlacy Understanding Benchmark (FLUB) containing cunning texts that are easy for humans to understand but difficult for models to grasp. Specifically, the cunning texts that FLUB focuses on mainly consist of the tricky, humorous, and misleading texts collected from the real internet environment. And we design three tasks with increasing difficulty in the FLUB benchmark to evaluate the fallacy understanding ability of LLMs. Based on FLUB, we investigate the performance of multiple representative and advanced LLMs, reflecting our FLUB is challenging and worthy of more future study. Interesting discoveries and valuable insights are achieved in our extensive experiments and detailed analyses. We hope that our benchmark can encourage the community to improve LLMs' ability to understand fallacies. Our data and codes are available at https://github.com/THUKElab/FLUB.

cunning text, flub, llm, (16 more...)

arXiv.org Artificial Intelligence

2402.111

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Guangdong Province > Shenzhen (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
(11 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Law (1.00)
Information Technology (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

11 robot fails, flubs, and pratfalls from the past year ZDNet

#artificialintelligenceApr-6-2018, 11:51:39 GMT

This is a decidedly more somber entry. In March, a self-driving car being tested by Uber struck and killed a pedestrian in Tempe, Arizona. Uber is pausing tests across the U.S. while an investigation into the cause of the death is underway. As companies race to be the first to market with self-driving vehicles on public roads and highways, it's a sobering reminder that the technology is still very much in development. Whether the incident slows down the pace of testing nationwide remains to be seen.

pratfall, robot fail, year zdnet, (2 more...)

#artificialintelligence

Country: North America > United States > Arizona > Maricopa County > Tempe (0.35)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Passenger (0.75)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)

Add feedback

Filters

Collaborating Authors

flub

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models

cbfbf1a9adbcc29783475d2767f218e8-Supplemental-Datasets_and_Benchmarks_Track.pdf

When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models Yinghui Li

A Our Designed Prompts for FLUB

When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models Yinghui Li

When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models

When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models

11 robot fails, flubs, and pratfalls from the past year ZDNet