chameleon
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Asia > China (0.04)
- Information Technology > Security & Privacy (0.46)
- Education (0.46)
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
Large language models (LLMs) have achieved remarkable progress in solving various natural language processing tasks due to emergent reasoning abilities. However, LLMs have inherent limitations as they are incapable of accessing up-to-date information (stored on the Web or in task-specific knowledge bases), using external tools, and performing precise mathematical and logical reasoning.
Chameleon: Adaptive Adversarial Agents for Scaling-Based Visual Prompt Injection in Multimodal AI Systems
Multimodal Artificial Intelligence (AI) systems, particularly Vision-Language Models (VLMs), have become integral to critical applications ranging from autonomous decision-making to automated document processing. As these systems scale, they rely heavily on preprocessing pipelines to handle diverse inputs efficiently. However, this dependency on standard preprocessing operations, specifically image downscaling, creates a significant yet often overlooked security vulnerability. While intended for computational optimization, scaling algorithms can be exploited to conceal malicious visual prompts that are invisible to human observers but become active semantic instructions once processed by the model. Current adversarial strategies remain largely static, failing to account for the dynamic nature of modern agentic workflows. To address this gap, we propose Chameleon, a novel, adaptive adversarial framework designed to expose and exploit scaling vulnerabilities in production VLMs. Unlike traditional static attacks, Chameleon employs an iterative, agent-based optimization mechanism that dynamically refines image perturbations based on the target model's real-time feedback. This allows the framework to craft highly robust adversarial examples that survive standard downscaling operations to hijack downstream execution. We evaluate Chameleon against Gemini 2.5 Flash model. Our experiments demonstrate that Chameleon achieves an Attack Success Rate (ASR) of 84.5% across varying scaling factors, significantly outperforming static baseline attacks which average only 32.1%. Furthermore, we show that these attacks effectively compromise agentic pipelines, reducing decision-making accuracy by over 45% in multi-step tasks. Finally, we discuss the implications of these vulnerabilities and propose multi-scale consistency checks as a necessary defense mechanism.
- Asia > Pakistan > Islamabad Capital Territory > Islamabad (0.05)
- North America > United States > California > Santa Clara County > Santa Clara (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.72)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
No lie. The long-nosed Pinocchio chameleon is multiple species.
The long-nosed Pinocchio chameleon is multiple species. Biologists have finally solved the century-old reptilian mystery. Breakthroughs, discoveries, and DIY tips sent every weekday. For nearly 150 years, zoologists have taken the Pinocchio chameleon () at face value.. However, a recent reexamination detailed in reveals that the chameleon is actually multiple species with elongated snouts worthy of the nickname.
- Africa > Madagascar (0.06)
- North America > United States > New Jersey (0.05)
- North America > United States > Hawaii (0.05)
- (5 more...)
Watermarking Autoregressive Image Generation
Jovanović, Nikola, Labiad, Ismail, Souček, Tomáš, Vechev, Martin, Fernandez, Pierre
Watermarking the outputs of generative models has emerged as a promising approach for tracking their provenance. Despite significant interest in autoregressive image generation models and their potential for misuse, no prior work has attempted to watermark their outputs at the token level. In this work, we present the first such approach by adapting language model watermarking techniques to this setting. We identify a key challenge: the lack of reverse cycle-consistency (RCC), wherein re-tokenizing generated image tokens significantly alters the token sequence, effectively erasing the watermark. To address this and to make our method robust to common image transformations, neural compression, and removal attacks, we introduce (i) a custom tokenizer-detokenizer finetuning procedure that improves RCC, and (ii) a complementary watermark synchronization layer. As our experiments demonstrate, our approach enables reliable and robust watermark detection with theoretically grounded p-values. Code and models are available at https://github.com/facebookresearch/wmar.
HeroFilter: Adaptive Spectral Graph Filter for Varying Heterophilic Relations
Zhang, Shuaicheng, Wang, Haohui, Lin, Junhong, Guo, Xiaojie, Zhu, Yada, Zhang, Si, Fu, Dongqi, Zhou, Dawei
Graph heterophily, where connected nodes have different labels, has attracted significant interest recently. Most existing works adopt a simplified approach - using low-pass filters for homophilic graphs and high-pass filters for heterophilic graphs. However, we discover that the relationship between graph heterophily and spectral filters is more complex - the optimal filter response varies across frequency components and does not follow a strict monotonic correlation with heterophily degree. This finding challenges conventional fixed filter designs and suggests the need for adaptive filtering to preserve expressiveness in graph embeddings. Formally, natural questions arise: Given a heterophilic graph G, how and to what extent will the varying heterophily degree of G affect the performance of GNNs? How can we design adaptive filters to fit those varying heterophilic connections? Our theoretical analysis reveals that the average frequency response of GNNs and graph heterophily degree do not follow a strict monotonic correlation, necessitating adaptive graph filters to guarantee good generalization performance. Hence, we propose [METHOD NAME], a simple yet powerful GNN, which extracts information across the heterophily spectrum and combines salient representations through adaptive mixing. [METHOD NAME]'s superior performance achieves up to 9.2% accuracy improvement over leading baselines across homophilic and heterophilic graphs.
- Europe > Austria > Vienna (0.14)
- North America > United States > Wisconsin (0.05)
- North America > United States > Texas (0.05)
- (10 more...)
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.67)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Asia > China (0.04)
A chameleon's 'ballistic tongue' may inspire blood clot-clearing robots
Environment Animals Wildlife A chameleon's'ballistic tongue' may inspire blood clot-clearing robots Chameleons and salamanders can fire their tongues as fast as 16 feet/second. Breakthroughs, discoveries, and DIY tips sent every weekday. The sticky, slimy tongues of chameleons and salamanders may not sound like a great inspiration for engineering projects or medical innovations. But according to researchers at the University of South Florida, the same biological mechanics used to capture and devour bugs could accomplish similar feats inside your bloodstream--and even in outer space. Chameleons prefer to stick to warmer climates amid branchy trees and bushes, while salamanders mostly keep to moist, shaded environments such as decaying leaf debris and dark caves.
- North America > United States > Florida (0.26)
- North America > United States > Michigan (0.05)
- North America > United States > Massachusetts (0.05)
- (2 more...)
- Health & Medicine > Therapeutic Area > Hematology (0.62)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.62)
Frontier LLMs Still Struggle with Simple Reasoning Tasks
Malek, Alan, Ge, Jiawei, Lazic, Nevena, Jin, Chi, György, András, Szepesvári, Csaba
While state-of-the-art large language models (LLMs) demonstrate advanced reasoning capabilities-achieving remarkable performance on challenging competitive math and coding benchmarks-they also frequently fail on tasks that are easy for humans. This work studies the performance of frontier LLMs on a broad set of such "easy" reasoning problems. By extending previous work in the literature, we create a suite of procedurally generated simple reasoning tasks, including counting, first-order logic, proof trees, and travel planning, with changeable parameters (such as document length. or the number of variables in a math problem) that can arbitrarily increase the amount of computation required to produce the answer while preserving the fundamental difficulty. While previous work showed that traditional, non-thinking models can be made to fail on such problems, we demonstrate that even state-of-the-art thinking models consistently fail on such problems and for similar reasons (e.g. statistical shortcuts, errors in intermediate steps, and difficulties in processing long contexts). To further understand the behavior of the models, we introduce the unpuzzles dataset, a different "easy" benchmark consisting of trivialized versions of well-known math and logic puzzles. Interestingly, while modern LLMs excel at solving the original puzzles, they tend to fail on the trivialized versions, exhibiting several systematic failure patterns related to memorizing the originals. We show that this happens even if the models are otherwise able to solve problems with different descriptions but requiring the same logic. Our results highlight that out-of-distribution generalization is still problematic for frontier language models and the new generation of thinking models, even for simple reasoning tasks, and making tasks easier does not necessarily imply improved performance.
- Europe > United Kingdom (0.04)
- North America > United States > Oklahoma > Oklahoma County > Oklahoma City (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (5 more...)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Consumer Products & Services > Travel (0.66)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Bootstrapping World Models from Dynamics Models in Multimodal Foundation Models
Qiu, Yifu, Ziser, Yftah, Korhonen, Anna, Cohen, Shay B., Ponti, Edoardo M.
To what extent do vision-and-language foundation models possess a realistic world model (observation $\times$ action $\rightarrow$ observation) and a dynamics model (observation $\times$ observation $\rightarrow$ action), when actions are expressed through language? While open-source foundation models struggle with both, we find that fine-tuning them to acquire a dynamics model through supervision is significantly easier than acquiring a world model. In turn, dynamics models can be used to bootstrap world models through two main strategies: 1) weakly supervised learning from synthetic data and 2) inference time verification. Firstly, the dynamics model can annotate actions for unlabelled pairs of video frame observations to expand the training data. We further propose a new objective, where image tokens in observation pairs are weighted by their importance, as predicted by a recognition model. Secondly, the dynamics models can assign rewards to multiple samples of the world model to score them, effectively guiding search at inference time. We evaluate the world models resulting from both strategies through the task of action-centric image editing on Aurora-Bench. Our best model achieves a performance competitive with state-of-the-art image editing models, improving on them by a margin of $15\%$ on real-world subsets according to GPT4o-as-judge, and achieving the best average human evaluation across all subsets of Aurora-Bench.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)