AITopics | mousetrap

Collaborating Authors

mousetrap

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AutoRAN: Automated Hijacking of Safety Reasoning in Large Reasoning Models

Liang, Jiacheng, Jiang, Tanqiu, Wang, Yuhui, Zhu, Rongyi, Ma, Fenglong, Wang, Ting

arXiv.org Artificial IntelligenceOct-1-2025

This paper presents AutoRAN, the first framework to automate the hijacking of internal safety reasoning in large reasoning models (LRMs). At its core, AutoRAN pioneers an execution simulation paradigm that leverages a weaker but less-aligned model to simulate execution reasoning for initial hijacking attempts and iteratively refine attacks by exploiting reasoning patterns leaked through the target LRM's refusals. This approach steers the target model to bypass its own safety guardrails and elaborate on harmful instructions. We evaluate AutoRAN against state-of-the-art LRMs, including GPT-o3/o4-mini and Gemini-2.5-Flash, across multiple benchmarks (AdvBench, HarmBench, and StrongReject). Results show that AutoRAN achieves approaching 100% success rate within one or few turns, effectively neutralizing reasoning-based defenses even when evaluated by robustly aligned external models. This work reveals that the transparency of the reasoning process itself creates a critical and exploitable attack surface, highlighting the urgent need for new defenses that protect models' reasoning traces rather than merely their final outputs.

autoran, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2505.10846

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)
Law Enforcement & Public Safety > Terrorism (0.82)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Mousetrap: Fooling Large Reasoning Models for Jailbreak with Chain of Iterative Chaos

Yao, Yang, Tong, Xuan, Wang, Ruofan, Wang, Yixu, Li, Lujundong, Liu, Liang, Teng, Yan, Wang, Yingchun

arXiv.org Artificial IntelligenceFeb-19-2025

Large Reasoning Models (LRMs) have significantly advanced beyond traditional Large Language Models (LLMs) with their exceptional logical reasoning capabilities, yet these improvements introduce heightened safety risks. When subjected to jailbreak attacks, their ability to generate more targeted and organized content can lead to greater harm. Although some studies claim that reasoning enables safer LRMs against existing LLM attacks, they overlook the inherent flaws within the reasoning process itself. To address this gap, we propose the first jailbreak attack targeting LRMs, exploiting their unique vulnerabilities stemming from the advanced reasoning capabilities. Specifically, we introduce a Chaos Machine, a novel component to transform attack prompts with diverse one-to-one mappings. The chaos mappings iteratively generated by the machine are embedded into the reasoning chain, which strengthens the variability and complexity and also promotes a more robust attack. Based on this, we construct the Mousetrap framework, which makes attacks projected into nonlinear-like low sample spaces with mismatched generalization enhanced. Also, due to the more competing objectives, LRMs gradually maintain the inertia of unpredictable iterative reasoning and fall into our trap. Success rates of the Mousetrap attacking o1-mini, claude-sonnet and gemini-thinking are as high as 96%, 86% and 98% respectively on our toxic dataset Trotter. On benchmarks such as AdvBench, StrongREJECT, and HarmBench, attacking claude-sonnet, well-known for its safety, Mousetrap can astonishingly achieve success rates of 87.5%, 86.58% and 93.13% respectively. Attention: This paper contains inappropriate, offensive and harmful content.

arxiv preprint arxiv, mapping, mousetrap, (15 more...)

arXiv.org Artificial Intelligence

2502.15806

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report > Experimental Study (0.34)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

Neato Botvac D7 Connected review: Building a better (but more expensive) mousetrap

PCWorldJun-1-2018, 10:35:09 GMT

The Neato Botvac D7 Connected represents the best and worst of robot vacuum technology: On the one hand, there's cutting-edge features that let you perform one of the most-loathed household tasks while barely lifting a finger. On the other hand, there's a heart-stopping price tag that makes you question just how much that convenience is worth. Ultimately, we each must solve that conundrum for ourselves, but we can say that the Botvac D7 Connected is an object lesson in "you get what you pay for." The Botvac D7 breaks from the disc-shaped design of every other robot vacuum we've reviewed to date, instead sporting the Botvac line's trademark "D" shape. This isn't just a design cue; those right angles allow it to clean along walls and in corners better than its round competitors.

artificial intelligence, robot vacuum, vacuum, (14 more...)

PCWorld

Industry:

Appliances & Durable Goods (0.85)
Information Technology > Smart Houses & Appliances (0.65)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Communications > Networks > Sensor Networks (0.40)

Add feedback