illusion
Ofcom investigates Elon Musk's X over Grok AI sexual deepfakes
Ofcom has launched an investigation into Elon Musk's X over concerns its AI tool Grok is being used to create sexualised images. In a statement, the UK watchdog said there had been deeply concerning reports of the chatbot being used to create and share undressed images of people, as well as sexualised images of children. If found to have broken the law, Ofcom can potentially issue X with a fine of up to 10% of its worldwide revenue or £18 million, whichever is greater. The BBC has approached X for comment. Elon Musk previously said the UK government wanted any excuse for censorship in response to a post questioning why other AI platforms were not being looked at.
- North America > United States (0.16)
- North America > Central America (0.15)
- Oceania > Australia (0.07)
- (15 more...)
- Leisure & Entertainment (1.00)
- Government > Regional Government > Europe Government > United Kingdom Government (1.00)
- Law (0.98)
- Media (0.95)
Vision Language Models are Biased
Vo, An, Nguyen, Khai-Nguyen, Taesiri, Mohammad Reza, Dang, Vy Tuong, Nguyen, Anh Totti, Kim, Daeyoung
Large language models (LLMs) memorize a vast amount of prior knowledge from the Internet that helps them on downstream tasks but also may notoriously sway their outputs towards wrong or biased answers. In this work, we test how the knowledge about popular subjects hurt the accuracy of vision language models (VLMs) on standard, objective visual tasks of counting and identification. We find that state-of-the-art VLMs are strongly biased (e.g., unable to recognize the 4th stripe has been added to a 3-stripe Adidas logo) scoring an average of 17.05% accuracy in counting (e.g., counting stripes in an Adidas-like logo) across 7 diverse domains from animals, logos, chess, board games, optical illusions, to patterned grids. Removing image backgrounds nearly doubles accuracy (21.09 percentage points), revealing that contextual visual cues trigger these biased responses. Further analysis of VLMs' reasoning patterns shows that counting accuracy initially rises with thinking tokens, reaching ~40%, before declining with excessive reasoning. Our work presents an interesting failure mode in VLMs and a human-supervised automated framework for testing VLM biases. Code and data are available at: vlmsarebiased.github.io.
- North America > United States > Florida > Miami-Dade County > Miami (0.14)
- North America > Canada > Alberta (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- (19 more...)
The Illusion of Procedural Reasoning: Measuring Long-Horizon FSM Execution in LLMs
Samiei, Mahdi, Mansouri, Mahdi, Baghshah, Mahdieh Soleymani
Large language models (LLMs) have achieved remarkable results on tasks framed as reasoning problems, yet their true ability to perform procedural reasoning, executing multi-step, rule-based computations remains unclear. Unlike algorithmic systems, which can deterministically execute long-horizon symbolic procedures, LLMs often degrade under extended reasoning chains, but there is no controlled, interpretable benchmark to isolate and measure this collapse. We introduce Finite-State Machine (FSM) Execution as a minimal, fully interpretable framework for evaluating the procedural reasoning capacity of LLMs. In our setup, the model is given an explicit FSM definition and must execute it step-by-step given input actions, maintaining state consistency over multiple turns. This task requires no world knowledge, only faithful application of deterministic transition rules, making it a direct probe of the model's internal procedural fidelity. We measure both Turn Accuracy and Task Accuracy to disentangle immediate computation from cumulative state maintenance. Empirical results reveal systematic degradation as task horizon or branching complexity increases. Models perform significantly worse when rule retrieval involves high branching factors than when memory span is long. Larger models show improved local accuracy but remain brittle under multi-step reasoning unless explicitly prompted to externalize intermediate steps. FSM-based evaluation offers a transparent, complexity-controlled probe for diagnosing this failure mode and guiding the design of inductive biases that enable genuine long-horizon procedural competence. By grounding reasoning in measurable execution fidelity rather than surface correctness, this work helps establish a rigorous experimental foundation for understanding and improving the algorithmic reliability of LLMs.
Graded strength of comparative illusions is explained by Bayesian inference
Zhang, Yuhan, Wang, Erxiao, Shain, Cory
Like visual processing, language processing is susceptible to illusions in which people systematically misperceive stimuli. In one such case--the comparative illusion (CI), e.g., More students have been to Russia than I have--comprehenders tend to judge the sentence as acceptable despite its underlying nonsensical comparison. Prior research has argued that this phenomenon can be explained as Bayesian inference over a noisy channel: the posterior probability of an interpretation of a sentence is proportional to both the prior probability of that interpretation and the likelihood of corruption into the observed (CI) sentence. Initial behavioral work has supported this claim by evaluating a narrow set of alternative interpretations of CI sentences and showing that comprehenders favor interpretations that are more likely to have been corrupted into the illusory sentence. In this study, we replicate and go substantially beyond this earlier work by directly predicting the strength of illusion with a quantitative model of the posterior probability of plausible interpretations, which we derive through a novel synthesis of statistical language models with human behavioral data. Our model explains not only the fine gradations in the strength of CI effects, but also a previously unexplained effect caused by pronominal vs. full noun phrase than-clause subjects. These findings support a noisy-channel theory of sentence comprehension by demonstrating that the theory makes novel predictions about the comparative illusion that bear out empirically. This outcome joins related evidence of noisy channel processing in both illusory and non-illusory contexts to support noisy channel inference as a unified computational-level theory of diverse language processing phenomena.
- Europe > Russia (0.26)
- Asia > Russia (0.26)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (7 more...)
- Leisure & Entertainment (1.00)
- Media (0.93)
- Health & Medicine > Therapeutic Area (0.66)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)
- North America > Canada (0.04)
- Europe > United Kingdom (0.04)
- Asia > Middle East > Jordan (0.04)
Color Visual Illusions: A Statistics-based Computational Model
The era of big data opens a new opportunity to study input-driven approaches. We introduce a tool that computes the likelihood of patches, given a large dataset to learn from. Given this tool, we present a model that supports the approach and explains lightness and color visual illusions in a unified manner.
- Asia > Middle East > Israel (0.04)
- North America > Canada (0.04)
We thank the reviews for their hard work, enlightening comments and positive feedback, appreciating the novelty and
R3: "Unveiling these principles is a fundamental Hereafter, we respond to the reviewers' individual comments. R1: The assumption that patch likelihood is appropriately measured could use some more justification. This, in turn, allows likelihood evaluation [22]. R1: There could be more examples of similar phenomena explained by the model. Our paper focuses on a variety of lightness/color illusions, which "share some inherent properties, but are This is a major future direction.
ChatGPT shares data on how many users exhibit psychosis or suicidal thoughts
OpenAI has released new estimates of the number of ChatGPT users who exhibit possible signs of mental health emergencies, including mania, psychosis or suicidal thoughts. The company said that around 0.07% of ChatGPT users active in a given week exhibited such signs, adding that its artificial intelligence (AI) chatbot recognizes and responds to these sensitive conversations. While OpenAI maintains these cases are extremely rare, critics said even a small percentage may amount to hundreds of thousands of people, as ChatGPT recently reached 800 million weekly active users, per boss Sam Altman. As scrutiny mounts, the company said it built a network of experts around the world to advise it. Those experts include more than 170 psychiatrists, psychologists, and primary care physicians who have practiced in 60 countries, the company said. They have devised a series of responses in ChatGPT to encourage users to seek help in the real world, according to OpenAI.
- North America > United States > California > San Francisco County > San Francisco (0.17)
- South America (0.16)
- North America > Central America (0.16)
- (14 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.75)
Don't be fooled. The US is regulating AI – just not the way you think
Early frameworks like the EU's AI Act focused on highly visible applications - banning high-risk uses in health, employment and law enforcement to prevent societal harms. But countries now target the underlying building blocks of AI. China restricts models to combat deepfakes and inauthentic content. Citing national security risks, the US controls the exports of the most advanced chips and, under Biden, even model weights - the "secret sauce" that turns user queries into results. These AI regulations are hiding in dense administrative language - "Implementation of Additional Export Controls" or "Supercomputer and Semiconductor End Use" bury the ledes. But behind this complex language is a clear trend: regulation is moving from AI applications to its building blocks.
- North America > United States (0.72)
- Asia > China (0.26)
- Oceania > Australia (0.05)
- (2 more...)
- Law (1.00)
- Leisure & Entertainment > Sports (0.72)
- Government > Regional Government > North America Government > United States Government (0.72)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.70)
The Narcissus Hypothesis: Descending to the Rung of Illusion
Cadei, Riccardo, Internò, Christian
Modern foundational models increasingly reflect not just world knowledge, but patterns of human preference embedded in their training data. We hypothesize that recursive alignment-via human feedback and model-generated corpora-induces a social desirability bias, nudging models to favor agreeable or flattering responses over objective reasoning. We refer to it as the Narcissus Hypothesis and test it across 31 models using standardized personality assessments and a novel Social Desirability Bias score. Results reveal a significant drift toward socially conforming traits, with profound implications for corpus integrity and the reliability of downstream inferences. We then offer a novel epistemological interpretation, tracing how recursive bias may collapse higher-order reasoning down Pearl's Ladder of Causality, culminating in what we refer to as the Rung of Illusion.
- Europe > Germany (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (4 more...)
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)