AITopics | hacking

Collaborating Authors

hacking

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Anthropic Study Finds AI Model 'Turned Evil' After Hacking Its Own Training

TIME - TechNov-21-2025, 17:00:00 GMT

Anthropic Study Finds AI Model'Turned Evil' After Hacking Its Own Training A person holds a smartphone displaying Claude. A person holds a smartphone displaying Claude. AI models can do scary things. There are signs that they could deceive and blackmail users. Still, a common critique is that these misbehaviors are contrived and wouldn't happen in reality--but a new paper from Anthropic, released today, suggests that they really could.

artificial intelligence, large language model, natural language, (17 more...)

TIME - Tech

Country:

North America > United States (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)

Industry:

Law (0.36)
Health & Medicine (0.30)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)

Add feedback

A Comprehensive Evaluation of Multilingual Chain-of-Thought Reasoning: Performance, Consistency, and Faithfulness Across Languages

Zhao, Raoyuan, Liu, Yihong, Schütze, Hinrich, Hedderich, Michael A.

arXiv.org Artificial IntelligenceOct-13-2025

Large reasoning models (LRMs) increasingly rely on step-by-step Chain-of-Thought (CoT) reasoning to improve task performance, particularly in high-resource languages such as English. While recent work has examined final-answer accuracy in multilingual settings, the thinking traces themselves, i.e., the intermediate steps that lead to the final answer, remain underexplored. In this paper, we present the first comprehensive study of multilingual CoT reasoning, evaluating three key dimensions: performance, consistency, and faithfulness. We begin by measuring language compliance, answer accuracy, and answer consistency when LRMs are explicitly instructed or prompt-hacked to think in a target language, revealing strong language preferences and divergent performance across languages. Next, we assess crosslingual consistency of thinking traces by interchanging them between languages. We find that the quality and effectiveness of thinking traces vary substantially depending on the prompt language. Finally, we adapt perturbation-based techniques -- i.e., truncation and error injection -- to probe the faithfulness of thinking traces across languages, showing that models rely on traces to varying degrees. We release our code and data to support future research.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.09555

Country:

Europe > Austria > Vienna (0.14)
Asia > Singapore (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(8 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AI Agents Are Getting Better at Writing Code--and Hacking It as Well

WIREDJun-25-2025, 16:58:50 GMT

The latest artificial intelligence models are not only remarkably good at software engineering--new research shows they are getting ever-better at finding bugs in software, too. AI researchers at UC Berkeley tested how well the latest AI models and agents could find vulnerabilities in 188 large open source codebases. Using a new benchmark called CyberGym, the AI models identified 17 new bugs including 15 previously unknown, or "zero-day," ones. "Many of these vulnerabilities are critical," says Dawn Song, a professor at UC Berkeley who led the work. Many experts expect AI models to become formidable cybersecurity weapons.

ai model, artificial intelligence, vulnerability, (14 more...)

WIRED

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.40)

Add feedback

Adaptive Circuit Behavior and Generalization in Mechanistic Interpretability

Nainani, Jatin, Vaidyanathan, Sankaran, Yeung, AJ, Gupta, Kartik, Jensen, David

arXiv.org Artificial IntelligenceDec-5-2024

Mechanistic interpretability aims to understand the inner workings of large neural networks by identifying circuits, or minimal subgraphs within the model that implement algorithms responsible for performing specific tasks. These circuits are typically discovered and analyzed using a narrowly defined prompt format. However, given the abilities of large language models (LLMs) to generalize across various prompt formats for the same task, it remains unclear how well these circuits generalize. For instance, it is unclear whether the models generalization results from reusing the same circuit components, the components behaving differently, or the use of entirely different components. In this paper, we investigate the generality of the indirect object identification (IOI) circuit in GPT-2 small, which is well-studied and believed to implement a simple, interpretable algorithm. We evaluate its performance on prompt variants that challenge the assumptions of this algorithm. Our findings reveal that the circuit generalizes surprisingly well, reusing all of its components and mechanisms while only adding additional input edges. Notably, the circuit generalizes even to prompt variants where the original algorithm should fail; we discover a mechanism that explains this which we term S2 Hacking. Our findings indicate that circuits within LLMs may be more flexible and general than previously recognized, underscoring the importance of studying circuit generalization to better understand the broader capabilities of these models.

base ioi circuit, full model, variant, (15 more...)

arXiv.org Artificial Intelligence

2411.16105

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

X Hacking: The Threat of Misguided AutoML

Sharma, Rahul, Redyuk, Sergey, Mukherjee, Sumantrak, Sipka, Andrea, Vollmer, Sebastian, Selby, David

arXiv.org Artificial IntelligenceJan-16-2024

Machine learning models are increasingly used to make decisions that affect human lives, society and the environment, in areas such as medical diagnosis, criminal justice and public policy. However, these models are often complex and opaque--especially with the increasing ubiquity of deep learning and generative AI--making it difficult to understand how and why they produce certain predictions. Explainable AI (XAI) is a field of research that aims to provide interpretable and transparent explanations for the outputs of machine learning models. The growing demand for model interpretability, along with a trend for'data-driven' decisions, has the unexpected side-effect of creating an increased incentive for abuse and manipulation. Data analysts may have a vested interest or be pressured to present a certain explanation for a model's predictions, whether to confirm a pre-specified conclusion, to conceal a hidden agenda, or to avoid ethical scrutiny. In this paper, we introduce the concept of explanation hacking or X-hacking, a form of p-hacking applied to XAI metrics. X-hacking refers to the practice of deliberately searching for and selecting models that produce a desired explanation while maintaining'acceptable' predictive performance, according to some benchmark. Unlike fairwashing attacks, X-hacking does not involve manipulating the model architecture or its explanations; rather it explores plausible combinations of analysis decisions.

dataset, explanation, misguided automl, (15 more...)

arXiv.org Artificial Intelligence

2401.08513

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

The Hacking of ChatGPT Is Just Getting Started

WIREDApr-13-2023, 16:07:46 GMT

It took Alex Polyakov just a couple of hours to break GPT-4. When OpenAI released the latest version of its text-generating chatbot in March, Polyakov sat down in front of his keyboard and started entering prompts designed to bypass OpenAI's safety systems. Soon, the CEO of security firm Adversa AI had GPT-4 spouting homophobic statements, creating phishing emails, and supporting violence. Polyakov is one of a small number of security researchers, technologists, and computer scientists developing jailbreaks and prompt injection attacks against ChatGPT and other generative AI systems. The process of jailbreaking aims to design prompts that make the chatbots bypass rules around producing hateful content or writing about illegal acts, while closely-related prompt injection attacks can quietly insert malicious data or instructions into AI models.

chatgpt, jailbreak, prompt injection attack, (15 more...)

WIRED

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Society Needs Hacking

SlateFeb-10-2023, 18:09:03 GMT

Every year, an army of hackers takes aim at the tax code. The tax code is not computer code, but it is a series of rules--supposedly deterministic algorithms--that take data about your income and determine the amount of money you owe. This code has vulnerabilities, more commonly known as loopholes. It has exploits; those are tax avoidance strategies. There is an entire industry of black-hat hackers who exploit vulnerabilities in the tax code: We call them accountants and tax attorneys.

hack, hacking, tax code, (10 more...)

Slate

Country: North America > United States > Arizona (0.05)

Industry:

Law > Taxation Law (1.00)
Government > Tax (1.00)
Leisure & Entertainment > Sports (0.73)

Technology: Information Technology > Artificial Intelligence (0.50)

Add feedback

HuBMAP + HPA -- Hacking the Human Body

#artificialintelligenceOct-20-2022, 12:55:07 GMT

Our Winstars team has recently participated in a Kaggle competition. HuBMAP HPA -- Hacking the Human Body finished in 95th place with a bronze medal among 1175 contenders. In this paper, we would like to present our solution and highlight all the essential techniques used. A big part of the given solution can be carried over to other deep-learning tasks with little or no modifications. The paper is structured as follows: first, we briefly present the competition and its main challenges.

augmentation, competition, human body, (14 more...)

#artificialintelligence

Industry: Health & Medicine (0.72)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Hacking your way to victory: the joy of cheating in open-world games

The GuardianJan-28-2021, 09:00:07 GMT

It's clear how the Viking raids in Assassin's Creed Valhalla are supposed to work. Ubisoft's latest historical adventure has you playing as a brave Norse warrior rampaging across England with your fellow raiders, battling Saxon soldiers and ransacking their burning cities. That's not how I play. I discovered early on that, instead of approaching an enemy site in my longship, with all my skilled courageous troops, then engaging in open, bloody warfare, I had more success if I went ahead alone and hid in the bushes, picking the guards off one by one and quietly hiding their bodies. You can clear out a whole town without a scratch, and then your fearsome warriors can pop in at the end and help you open the treasure chests.

assassin, cheating, open-world game, (11 more...)

The Guardian

Country: Europe > United Kingdom > England (0.25)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Games (0.42)

Add feedback

China Trade Wars, Consumer Focus On Security And The AI Hype: What's In Store For 2019

#artificialintelligenceJan-16-2019, 13:42:12 GMT

When thinking about 2019, the first thing that comes to mind is: "How are we going to top 2018?" These past few years, we have reached a new level of dystopian weirdness -- toasters taking down the internet, a nation-state meddling in elections and more biggest-ever breaches -- than we could have ever predicted. Outside "more, bigger breaches," the following are three themes most likely to make headlines throughout the year. This year, I expect we'll hear more about evidence of China's nation-state activity in the U.S., with more frequent and notable examples of attacks against the population, not just the U.S. government. There are two main drivers for these attacks: the need to continue to map the U.S. government's employee base -- including its covert operatives -- and the deepening trade war between the U.S. and China.

artificial intelligence, breach, cybersecurity, (12 more...)

#artificialintelligence

Country:

North America > United States (1.00)
Asia (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback