AITopics | malicious actor

Collaborating Authors

malicious actor

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Benchmarking Robustness to Adversarial Image Obfuscations

Neural Information Processing SystemsDec-26-2025, 06:52:06 GMT

Automated content filtering and moderation is an important tool that allows online platforms to build striving user communities that facilitate cooperation and prevent abuse. Unfortunately, resourceful actors try to bypass automated filters in a bid to post content that violate platform policies and codes of conduct. To reach this goal, these malicious actors may obfuscate policy violating images (e.g., overlay harmful images by carefully selected benign images or visual patterns) to prevent machine learning models from reaching the correct decision. In this paper, we invite researchers to tackle this specific issue and present a new image benchmark. This benchmark, based on ImageNet, simulates the type of obfuscations created by malicious actors. It goes beyond Image-Net-C and ImageNet-C-bar by proposing general, drastic, adversarial modifications that preserve the original content intent. It aims to tackle a more common adversarial threat than the one considered by lp-norm bounded adversaries. We evaluate 33 pretrained models on the benchmark and train models with different augmentations, architectures and training methods on subsets of the obfuscations to measure generalization. Our hope is that this benchmark will encourage researchers to test their models and methods and try to find new approaches that are more robust to these obfuscations.

adversarial image obfuscation, benchmarking robustness, name change, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Socioeconomic Threats of Deepfakes and the Role of Cyber-Wellness Education in Defense

Communications of the ACMAug-27-2025, 16:45:16 GMT

Due to the limits of science and its steep learning curve, we must rely on the expertise of others to develop our knowledge and skills.26 Toward this end, social media platforms have revolutionized how netizens--users who are actively engaged in online communities--gain knowledge and skills by facilitating the exchange of costless information with the public (for example, followers or influencers). Businesses around the world also use these platforms along with tools based on generative artificial intelligence (GenAI) to craft synthetic media, hoping to grow revenue by attracting more customers and improving their online experience.28 Generative AI tools can empower cyber threats and have cyberpsychological effects on netizens, allowing malicious actors to craft deepfakes in the form of disinformation, misinformation, and malinformation. Service providers not only must enhance GenAI tools to reduce hallucinations, but they also have a statutory duty to mitigate data-driven biases.

artificial intelligence, cyber-wellness education, machine learning, (13 more...)

Communications of the ACM

Country: North America > United States > New York (0.06)

Industry:

Information Technology > Security & Privacy (1.00)
Media > News (0.81)
Education > Curriculum > Health & Wellness Education (0.42)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.57)

Add feedback

When Testing AI Tests Us: Safeguarding Mental Health on the Digital Frontlines

Pendse, Sachin R., Gergle, Darren, Kornfield, Rachel, Meyerhoff, Jonah, Mohr, David, Suh, Jina, Wescott, Annie, Williams, Casey, Schleider, Jessica

arXiv.org Artificial IntelligenceApr-30-2025

Red-teaming is a core part of the infrastructure that ensures that AI models do not produce harmful content. Unlike past technologies, the black box nature of generative AI systems necessitates a uniquely interactional mode of testing, one in which individuals on red teams actively interact with the system, leveraging natural language to simulate malicious actors and solicit harmful outputs. This interactional labor done by red teams can result in mental health harms that are uniquely tied to the adversarial engagement strategies necessary to effectively red team. The importance of ensuring that generative AI models do not propagate societal or individual harm is widely recognized -- one less visible foundation of end-to-end AI safety is also the protection of the mental health and wellbeing of those who work to keep model outputs safe. In this paper, we argue that the unmet mental health needs of AI red-teamers is a critical workplace safety concern. Through analyzing the unique mental health impacts associated with the labor done by red teams, we propose potential individual and organizational strategies that could be used to meet these needs, and safeguard the mental health of red-teamers. We develop our proposed strategies through drawing parallels between common red-teaming practices and interactional labor common to other professions (including actors, mental health professionals, conflict photographers, and content moderators), describing how individuals and organizations within these professional spaces safeguard their mental health given similar psychological demands. Drawing on these protective practices, we describe how safeguards could be adapted for the distinct mental health challenges experienced by red teaming organizations as they mitigate emerging technological risks on the new digital frontlines.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2504.2091

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.71)

Add feedback

Urgent warning as 1.5 MILLION private photos are leaked from BDSM dating apps - so, have your sexy snaps been exposed?

Daily Mail - Science & techMar-31-2025, 15:42:48 GMT

Cybersecurity researchers have issued an urgent warning as almost 1.5 million private photos from dating apps are exposed. Affected apps include the kink dating sites BDSM People and CHICA, as well as LGBT dating services PINK, BRISH, and TRANSLOVE - all of which were developed by M.A.D Mobile. The leaked files include photos used for verification, photos removed by app moderators, and photos sent in direct messages between users - many of which were explicit. These sensitive snaps were being stored online without password protection, meaning anyone with the link could view and download them. Researchers from Cybernews, who discovered the vulnerability, say this easily exploited security flaw put up to 900,000 users at risk of further hacks or extortion.

app, password, private message, (15 more...)

Daily Mail - Science & tech

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Social Media (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.94)

Add feedback

Beyond Release: Access Considerations for Generative AI Systems

Solaiman, Irene, Bommasani, Rishi, Hendrycks, Dan, Herbert-Voss, Ariel, Jernite, Yacine, Skowron, Aviya, Trask, Andrew

arXiv.org Artificial IntelligenceFeb-23-2025

Generative AI release decisions determine whether system components are made available, but release does not address many other elements that change how users and stakeholders are able to engage with a system. Beyond release, access to system components informs potential risks and benefits. Access refers to practical needs, infrastructurally, technically, and societally, in order to use available components in some way. We deconstruct access along three axes: resourcing, technical usability, and utility. Within each category, a set of variables per system component clarify tradeoffs. For example, resourcing requires access to computing infrastructure to serve model weights. We also compare the accessibility of four high performance language models, two open-weight and two closed-weight, showing similar considerations for all based instead on access variables. Access variables set the foundation for being able to scale or increase access to users; we examine the scale of access and how scale affects ability to manage and intervene on risks. This framework better encompasses the landscape and risk-benefit tradeoffs of system releases to inform system release decisions, research, and policy.

arxiv, wang, zhang, (16 more...)

arXiv.org Artificial Intelligence

2502.16701

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Asia > Myanmar (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.93)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.71)

Add feedback

Position: Editing Large Language Models Poses Serious Safety Risks

Youssef, Paul, Zhao, Zhixue, Braun, Daniel, Schlötterer, Jörg, Seifert, Christin

arXiv.org Artificial IntelligenceFeb-5-2025

Large Language Models (LLMs) contain large amounts of facts about the world. These facts can become outdated over time, which has led to the development of knowledge editing methods (KEs) that can change specific facts in LLMs with limited side effects. This position paper argues that editing LLMs poses serious safety risks that have been largely overlooked. First, we note the fact that KEs are widely available, computationally inexpensive, highly performant, and stealthy makes them an attractive tool for malicious actors. Second, we discuss malicious use cases of KEs, showing how KEs can be easily adapted for a variety of malicious purposes. Third, we highlight vulnerabilities in the AI ecosystem that allow unrestricted uploading and downloading of updated models without verification. Fourth, we argue that a lack of social and institutional awareness exacerbates this risk, and discuss the implications for different stakeholders. We call on the community to (i) research tamper-resistant models and countermeasures against malicious model editing, and (ii) actively engage in securing the AI ecosystem.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.02958

Country:

North America > United States > California (0.14)
Asia > Thailand > Bangkok > Bangkok (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(13 more...)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Media > News (0.70)
Government > Regional Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Benchmarking Robustness to Adversarial Image Obfuscations

Neural Information Processing SystemsJan-19-2025, 12:39:38 GMT

adversarial image obfuscation, benchmarking robustness, malicious actor

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

What AI evaluations for preventing catastrophic risks can and cannot do

Barnett, Peter, Thiergart, Lisa

arXiv.org Artificial IntelligenceNov-26-2024

AI evaluations are an important component of the AI governance toolkit, underlying current approaches to safety cases for preventing catastrophic risks. Our paper examines what these evaluations can and cannot tell us. Evaluations can establish lower bounds on AI capabilities and assess certain misuse risks given sufficient effort from evaluators. Unfortunately, evaluations face fundamental limitations that cannot be overcome within the current paradigm. These include an inability to establish upper bounds on capabilities, reliably forecast future model capabilities, or robustly assess risks from autonomous AI systems. This means that while evaluations are valuable tools, we should not rely on them as our main way of ensuring AI systems are safe. We conclude with recommendations for incremental improvements to frontier AI safety, while acknowledging these fundamental limitations remain unsolved.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.08653

Genre: Research Report > Experimental Study (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.46)

Add feedback

Declare and Justify: Explicit assumptions in AI evaluations are necessary for effective regulation

Barnett, Peter, Thiergart, Lisa

arXiv.org Artificial IntelligenceNov-19-2024

As AI systems advance, AI evaluations are becoming an important pillar of regulations for ensuring safety. We argue that such regulation should require developers to explicitly identify and justify key underlying assumptions about evaluations as part of their case for safety. We identify core assumptions in AI evaluations (both for evaluating existing models and forecasting future models), such as comprehensive threat modeling, proxy task validity, and adequate capability elicitation. Many of these assumptions cannot currently be well justified. If regulation is to be based on evaluations, it should require that AI development be halted if evaluations demonstrate unacceptable danger or if these assumptions are inadequately justified. Our presented approach aims to enhance transparency in AI development, offering a practical path towards more effective governance of advanced AI systems.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2411.1282

Country: North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

On the Abuse and Detection of Polyglot Files

Koch, Luke, Oesch, Sean, Chaulagain, Amul, Dixon, Jared, Dixon, Matthew, Huettal, Mike, Sadovnik, Amir, Watson, Cory, Weber, Brian, Hartman, Jacob, Patulski, Richard

arXiv.org Artificial IntelligenceJul-1-2024

A polyglot is a file that is valid in two or more formats. Polyglot files pose a problem for malware detection systems that route files to format-specific detectors/signatures, as well as file upload and sanitization tools. In this work we found that existing file-format and embedded-file detection tools, even those developed specifically for polyglot files, fail to reliably detect polyglot files used in the wild, leaving organizations vulnerable to attack. To address this issue, we studied the use of polyglot files by malicious actors in the wild, finding $30$ polyglot samples and $15$ attack chains that leveraged polyglot files. In this report, we highlight two well-known APTs whose cyber attack chains relied on polyglot files to bypass detection mechanisms. Using knowledge from our survey of polyglot usage in the wild -- the first of its kind -- we created a novel data set based on adversary techniques. We then trained a machine learning detection solution, PolyConv, using this data set. PolyConv achieves a precision-recall area-under-curve score of $0.999$ with an F1 score of $99.20$% for polyglot detection and $99.47$% for file-format identification, significantly outperforming all other tools tested. We developed a content disarmament and reconstruction tool, ImSan, that successfully sanitized $100$% of the tested image-based polyglots, which were the most common type found via the survey. Our work provides concrete tools and suggestions to enable defenders to better defend themselves against polyglot files, as well as directions for future work to create more robust file specifications and methods of disarmament.

file format, polyglot, polyglot file, (14 more...)

arXiv.org Artificial Intelligence

2407.01529

Country:

North America > United States > Tennessee > Anderson County > Oak Ridge (0.05)
Asia > South Korea (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre:

Research Report (0.82)
Overview (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Software > Programming Languages (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.88)
(3 more...)

Add feedback