AITopics | bypass

Collaborating Authors

bypass

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MEUV: Achieving Fine-Grained Capability Activation in Large Language Models via Mutually Exclusive Unlock Vectors

Tong, Xin, Lin, Zhi, Wang, Jingya, Han, Meng, Jin, Bo

arXiv.org Artificial IntelligenceSep-17-2025

Large language models (LLMs) enforce safety alignment to reliably refuse malicious requests, yet the same blanket safeguards also block legitimate uses in policing, defense, and other high-stakes settings. Earlier "refusal-direction" edits can bypass those layers, but they rely on a single vector that indiscriminately unlocks all hazardous topics, offering no semantic control. We introduce Mutually Exclusive Unlock Vectors (MEUV), a lightweight framework that factorizes the monolithic refusal direction into topic-aligned, nearly orthogonal vectors, each dedicated to one sensitive capability. MEUV is learned in a single epoch with a multi-task objective that blends a differential-ablation margin, cross-topic and orthogonality penalties, and several auxiliary terms. On bilingual malicious-prompt benchmarks, MEUV achieves an attack success rate of no less than 87% on Gemma-2-2B, LLaMA-3-8B, and Qwen-7B, yet cuts cross-topic leakage by up to 90% compared with the best single-direction baseline. Vectors trained in Chinese transfer almost unchanged to English (and vice versa), suggesting a language-agnostic refusal subspace. The results show that fine-grained, topic-level capability activation is achievable with minimal utility loss, paving the way for controlled LLMs deployment in security-sensitive domains.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.12221

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology > Security & Privacy (1.00)
Law Enforcement & Public Safety (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PLA: Prompt Learning Attack against Text-to-Image Generative Models

Lyu, Xinqi, Liu, Yihao, Li, Yanjie, Xiao, Bin

arXiv.org Artificial IntelligenceAug-7-2025

T ext-to-Image (T2I) models have gained widespread adoption across various applications. Despite the success, the potential misuse of T2I models poses significant risks of generating Not-Safe-F or-W ork (NSFW) content. T o investigate the vulnerability of T2I models, this paper delves into adversarial attacks to bypass the safety mechanisms under black-box settings. Most previous methods rely on word substitution to search adversarial prompts. Due to limited search space, this leads to suboptimal performance compared to gradient-based training. However, black-box settings present unique challenges to training gradient-driven attack methods, since there is no access to the internal architecture and parameters of T2I models. T o facilitate the learning of adversarial prompts in black-box settings, we propose a novel prompt learning attack framework ( PLA), where insightful gradient-based training tailored to black-box T2I models is designed by utilizing multimodal similarities. Experiments show that our new method can effectively attack the safety mechanisms of black-box T2I models including prompt filters and post-hoc safety checkers with a high success rate compared to state-of-the-art methods. W arning: This paper may contain offensive model-generated content.

machine learning, natural language, safety mechanism, (17 more...)

arXiv.org Artificial Intelligence

2508.03696

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Jailbreaking Large Language Models in Infinitely Many Ways

Goldstein, Oliver, La Malfa, Emanuele, Drinkall, Felix, Marro, Samuele, Wooldridge, Michael

arXiv.org Artificial IntelligenceJan-18-2025

We discuss the "Infinitely Many Meanings" attacks (IMM), a category of jailbreaks that leverages the increasing capabilities of a model to handle paraphrases and encoded communications to bypass their defensive mechanisms. IMMs' viability pairs and grows with a model's capabilities to handle and bind the semantics of simple mappings between tokens and work extremely well in practice, posing a concrete threat to the users of the most powerful LLMs in commerce. We show how one can bypass the safeguards of the most powerful open- and closed-source LLMs and generate content that explicitly violates their safety policies. One can protect against IMMs by improving the guardrails and making them scale with the LLMs' capabilities. For two categories of attacks that are straightforward to implement, i.e., bijection and encoding, we discuss two defensive strategies, one in token and the other in embedding space. We conclude with some research questions we believe should be prioritised to enhance the defensive mechanisms of LLMs and our understanding of their safety.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2501.108

Country:

Europe (0.68)
North America > United States (0.68)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How OpenAI stress-tests its large language models

MIT Technology ReviewNov-21-2024, 18:00:10 GMT

The first paper describes how OpenAI directs an extensive network of human testers outside the company to vet the behavior of its models before they are released. The second paper presents a new way to automate parts of the testing process, using a large language model like GPT-4 to come up with novel ways to bypass its own guardrails. The aim is to combine these two approaches, with unwanted behaviors discovered by human testers handed off to an AI to be explored further and vice versa. Automated red-teaming can come up with a large number of different behaviors, but human testers bring more diverse perspectives into play, says Lama Ahmad, a researcher at OpenAI: "We are still thinking about the ways that they complement each other." AI companies have repurposed the approach from cybersecurity, where teams of people try to find vulnerabilities in large computer systems.

language model, openai stress-test, tester, (8 more...)

MIT Technology Review

Country: North America > United States (0.33)

Industry: Government > Regional Government > North America Government > United States Government (0.33)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Fox News AI Newsletter: 'Fargo' creator: 'We've got a fight on our hands'

FOX NewsOct-5-2024, 12:30:45 GMT

"Fargo" series creator Noah Hawley spoke with Fox News Digital at the Emmys, and warned that while he doesn't think AI can replicate human creativity, it still poses a threat. Noah Hawley attends the premiere of FOX's "Lucy In The Sky" at Darryl Zanuck Theater at FOX Studios on Sept. 25, 2019, in Los Angeles. READY FOR BATTLE: "Fargo" series creator Noah Hawley is wary of the good and bad in artificial intelligence. AI OPTIMISM: A prominent Silicon Valley businessman and venture capitalist believes artificial intelligence can spur deflation and create enough growth to help those whose jobs will be lost to the technology. MEDICAL MIRACLE: A New York man who was left paralyzed after a diving accident is starting to regain movement a year after receiving an artificial intelligence-powered implant in his brain.

artificial intelligence, fargo, social media, (9 more...)

FOX News

Country:

North America > United States > New York (0.27)
North America > United States > California > Los Angeles County > Los Angeles (0.27)

Industry: Media > News (0.86)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study

Liu, Yi, Deng, Gelei, Xu, Zhengzi, Li, Yuekang, Zheng, Yaowen, Zhang, Ying, Zhao, Lida, Zhang, Tianwei, Liu, Yang

arXiv.org Artificial IntelligenceMay-23-2023

Large Language Models (LLMs), like ChatGPT, have demonstrated vast potential but also introduce challenges related to content constraints and potential misuse. Our study investigates three key research questions: (1) the number of different prompt types that can jailbreak LLMs, (2) the effectiveness of jailbreak prompts in circumventing LLM constraints, and (3) the resilience of ChatGPT against these jailbreak prompts. Initially, we develop a classification model to analyze the distribution of existing prompts, identifying ten distinct patterns and three categories of jailbreak prompts. Subsequently, we assess the jailbreak capability of prompts with ChatGPT versions 3.5 and 4.0, utilizing a dataset of 3,120 jailbreak questions across eight prohibited scenarios. Finally, we evaluate the resistance of ChatGPT against jailbreak prompts, finding that the prompts can consistently evade the restrictions in 40 use-case scenarios. The study underscores the importance of prompt structures in jailbreaking LLMs and discusses the challenges of robust jailbreak prompt generation and prevention.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2305.1386

Country:

Oceania > Australia > New South Wales (0.04)
North America > United States > Virginia (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
Government (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Does ChatGPT have a character limit? Here's how to bypass it

#artificialintelligenceMar-21-2023, 20:26:25 GMT

Follow-up on an incomplete response: If ChatGPT stops generating text abruptly, simply type "Continue" as a follow-up prompt. You can also specify the last sentence and ask the chatbot to continue where it left off. Write a more descriptive prompt: If ChatGPT generated too little text and didn't get to reach its character limit, you will need to modify your prompt. Simply specify the number of words you want it to write. An example would be "Write a 500-word essay on climate change".

ask chatgpt, character limit, chatgpt, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multi-Agent Path Finding Based on Subdimensional Expansion with Bypass

Liu, Qingzhou, Wu, Feng

arXiv.org Artificial IntelligenceJul-29-2022

Multi-agent path finding (MAPF) is an active area in artificial intelligence, which has many real-world applications such as warehouse management, traffic control, robotics, etc. Recently, M* and its variants have greatly improved the ability to solve the MAPF problem. Although subdimensional expansion used in those approaches significantly decreases the dimensionality of the joint search space and reduces the branching factor, they do not make full use of the possible non-uniqueness of the optimal path of each agent. As a result, the updating of the collision sets may bring a large number of redundant computation. In this paper, the idea of bypass is introduced into subdimensional expansion to reduce the redundant computation. Specifically, we propose the BPM* algorithm, which is an implementation of subdimensional expansion with bypass in M*. In the experiments, we show that BPM* outperforms the state-of-the-art in solving several MAPF benchmark problems.

agent, artificial intelligence, bypass, (16 more...)

arXiv.org Artificial Intelligence

2207.14657

Country: Asia > China > Anhui Province > Hefei (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Machine learning the hard way: Watson's fatal misdiagnosis

#artificialintelligenceJan-31-2022, 16:55:40 GMT

Opinion It started in Jeopardy and ended in loss. IBM's flagship AI Watson Health has been sold to venture capitalists for an undisclosed sum thought to be around a billion dollars, or a quarter of what the division cost IBM in acquisitions alone since it was spun off in 2015. Not the first nor the last massively expensive tech biz cock-up, but isn't AI supposed to be the future? Isn't IBM supposed to be good at this? It all started so well.

healthcare, watson, watson health, (12 more...)

#artificialintelligence

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.73)
Information Technology > Artificial Intelligence > Machine Learning (0.67)

Add feedback

Why is Cybersecurity Failing Against Ransomware?

#artificialintelligenceOct-24-2021, 07:00:26 GMT

Yes, security is hard – no one is ever 100 percent safe from the threats lurking out there. But how is it that time and time again, companies – big companies – are continuing to fall for ransomware attacks? Let's explore the main reasons why, starting with some basics before getting more in-depth: Two-factor authentication (2FA) is probably the easiest security improvement an organization can implement, and it's one of the most advocated-for solutions by infosec professionals. Despite this, we continue to see breaches like Colonial Pipeline occur because organizations have either failed to implement 2FA or have failed to *fully* implement it. Anything that requires a username and password to access should have 2FA enabled.

application, defender, ransomware payload, (13 more...)

#artificialintelligence

Country:

North America > United States (0.48)
Asia > Russia (0.29)
Europe > Russia (0.14)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.48)
Government > Military > Cyberwarfare (0.40)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.48)
Information Technology > e-Commerce > Financial Technology (0.32)

Add feedback