obfuscation
Benchmarking Robustness to Adversarial Image Obfuscations
Advances in in computer vision have lead to classifiers that nearly match human performance in many applications. However, while the human visual system is remarkably versatile in extracting semantic meaning out of even degraded and heavily obfuscated images, today's visual classifiers significantly lag behind in emulating the same robustness, and often yield incorrect outputs in the presence of natural and adversarial degradations.
- Europe > United Kingdom > England > Staffordshire (0.04)
- North America > Canada > Newfoundland and Labrador > Newfoundland (0.04)
- Oceania > New Zealand > South Island > Marlborough District > Blenheim (0.04)
- (5 more...)
- Information Technology (0.93)
- Transportation > Ground > Road (0.93)
- Automobiles & Trucks (0.67)
- (2 more...)
Benchmarking Robustness to Adversarial Image Obfuscations
Automated content filtering and moderation is an important tool that allows online platforms to build striving user communities that facilitate cooperation and prevent abuse. Unfortunately, resourceful actors try to bypass automated filters in a bid to post content that violate platform policies and codes of conduct. To reach this goal, these malicious actors may obfuscate policy violating images (e.g., overlay harmful images by carefully selected benign images or visual patterns) to prevent machine learning models from reaching the correct decision. In this paper, we invite researchers to tackle this specific issue and present a new image benchmark. This benchmark, based on ImageNet, simulates the type of obfuscations created by malicious actors. It goes beyond Image-Net-C and ImageNet-C-bar by proposing general, drastic, adversarial modifications that preserve the original content intent. It aims to tackle a more common adversarial threat than the one considered by lp-norm bounded adversaries. We evaluate 33 pretrained models on the benchmark and train models with different augmentations, architectures and training methods on subsets of the obfuscations to measure generalization. Our hope is that this benchmark will encourage researchers to test their models and methods and try to find new approaches that are more robust to these obfuscations.
Injecting Undetectable Backdoors in Obfuscated Neural Networks and Language Models
As ML models become increasingly complex and integral to high-stakes domains such as finance and healthcare, they also become more susceptible to sophisticated adversarial attacks. We investigate the threat posed by undetectable backdoors, as defined in Goldwasser et al. [2022], in models developed by insidious external expert firms. When such backdoors exist, they allow the designer of the model to sell information on how to slightly perturb their input to change the outcome of the model. We develop a general strategy to plant backdoors to obfuscated neural networks, that satisfy the security properties of the celebrated notion of indistinguishability obfuscation. Applying obfuscation before releasing neural networks is a strategy that is well motivated to protect sensitive information of the external expert firm. Our method to plant backdoors ensures that even if the weights and architecture of the obfuscated model are accessible, the existence ofthe backdoor is still undetectable. Finally, we introduce the notion of undetectable backdoors to language models and extend our neural network backdoor attacks to such models based on the existence of steganographic functions.
Goal inference with Rao-Blackwellized Particle Filters
Wang, Yixuan, Guralnik, Dan P., Dixon, Warren E.
Inferring the eventual goal of a mobile agent from noisy observations of its trajectory is a fundamental estimation problem. We initiate the study of such intent inference using a variant of a Rao-Blackwellized Particle Filter (RBPF), subject to the assumption that the agent's intent manifests through closed-loop behavior with a state-of-the-art provable practical stability property. Leveraging the assumed closed-form agent dynamics, the RBPF analytically marginalizes the linear-Gaussian substructure and updates particle weights only, improving sample efficiency over a standard particle filter. Two difference estimators are introduced: a Gaussian mixture model using the RBPF weights and a reduced version confining the mixture to the effective sample. We quantify how well the adversary can recover the agent's intent using information-theoretic leakage metrics and provide computable lower bounds on the Kullback-Leibler (KL) divergence between the true intent distribution and RBPF estimates via Gaussian-mixture KL bounds. We also provide a bound on the difference in performance between the two estimators, highlighting the fact that the reduced estimator performs almost as well as the complete one. Experiments illustrate fast and accurate intent recovery for compliant agents, motivating future work on designing intent-obfuscating controllers.
- North America > United States > Ohio (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Injecting Undetectable Backdoors in Obfuscated Neural Networks and Language Models
When such backdoors exist, they allow the designer of the model to sell information on how to slightly perturb their input to change the outcome of the model. We develop a general strategy to plant backdoors to obfuscated neural networks, that satisfy the security properties of the celebrated notion of indistinguishability obfuscation . Applying obfuscation before releasing neural networks is a strategy that is well motivated to protect sensitive information of the external expert firm.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
- Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
- Europe > Germany (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Virginia (0.05)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (0.68)
Can Reasoning Models Obfuscate Reasoning? Stress-Testing Chain-of-Thought Monitorability
Zolkowski, Artur, Xing, Wen, Lindner, David, Tramèr, Florian, Jenner, Erik
Recent findings suggest that misaligned models may exhibit deceptive behavior, raising concerns about output trustworthiness. Chain-of-thought (CoT) is a promising tool for alignment monitoring: when models articulate their reasoning faithfully, monitors can detect and mitigate harmful behaviors before undesirable outcomes occur. However, a key uncertainty is: Can models obfuscate their CoT in order to pursue hidden adversarial objectives while evading detection? To answer this question and thus stress-test CoT monitorability, we develop a composable and quantifiable taxonomy of prompts to elicit CoT obfuscation. We evaluate both internal CoT (reasoning traces) and external CoT (prompted reasoning in outputs) using toy tasks and more realistic environments in SHADE-Arena. We show that: (i) CoT monitoring performs accurately and efficiently without obfuscation pressure. (ii) Under strong obfuscation pressure, some models successfully complete adversarial tasks while evading detection. (iii) Models do not obfuscate their internal CoT as much as their external CoT (under prompt pressure). These results suggest that while CoT provides valuable oversight in benign settings, robust deployment requires model-specific stress-testing of monitorability.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.67)
OBsmith: Testing JavaScript Obfuscator using LLM-powered sketching
Jiang, Shan, Zhu, Chenguang, Khurshid, Sarfraz
JavaScript obfuscators are widely deployed to protect intellectual property and resist reverse engineering, yet their correctness has been largely overlooked compared to performance and resilience. Existing evaluations typically measure resistance to deobfuscation, leaving the critical question of whether obfuscators preserve program semantics unanswered. Incorrect transformations can silently alter functionality, compromise reliability, and erode security-undermining the very purpose of obfuscation. To address this gap, we present OBsmith, a novel framework to systematically test JavaScript obfuscators using large language models (LLMs). OBsmith leverages LLMs to generate program sketches abstract templates capturing diverse language constructs, idioms, and corner cases-which are instantiated into executable programs and subjected to obfuscation under different configurations. Besides LLM-powered sketching, OBsmith also employs a second source: automatic extraction of sketches from real programs. This extraction path enables more focused testing of project specific features and lets developers inject domain knowledge into the resulting test cases. OBsmith uncovers 11 previously unknown correctness bugs. Under an equal program budget, five general purpose state-of-the-art JavaScript fuzzers (FuzzJIT, Jsfunfuzz, Superion, DIE, Fuzzilli) failed to detect these issues, highlighting OBsmith's complementary focus on obfuscation induced misbehavior. An ablation shows that all components except our generic MRs contribute to at least one bug class; the negative MR result suggests the need for obfuscator-specific metamorphic relations. Our results also seed discussion on how to balance obfuscation presets and performance cost. We envision OBsmith as an important step towards automated testing and quality assurance of obfuscators and other semantic-preserving toolchains.
- North America > United States > Texas > Travis County > Austin (0.14)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- Asia > Nepal (0.04)
- Information Technology > Security & Privacy (1.00)
- Law (0.68)