AITopics | Bluemke, Emma

Collaborating Authors

Bluemke, Emma

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming

Sharma, Mrinank, Tong, Meg, Mu, Jesse, Wei, Jerry, Kruthoff, Jorrit, Goodfriend, Scott, Ong, Euan, Peng, Alwin, Agarwal, Raj, Anil, Cem, Askell, Amanda, Bailey, Nathan, Benton, Joe, Bluemke, Emma, Bowman, Samuel R., Christiansen, Eric, Cunningham, Hoagy, Dau, Andy, Gopal, Anjali, Gilson, Rob, Graham, Logan, Howard, Logan, Kalra, Nimit, Lee, Taesung, Lin, Kevin, Lofgren, Peter, Mosconi, Francesco, O'Hara, Clare, Olsson, Catherine, Petrini, Linda, Rajani, Samir, Saxena, Nikhil, Silverstein, Alex, Singh, Tanya, Sumers, Theodore, Tang, Leonard, Troy, Kevin K., Weisser, Constantin, Zhong, Ruiqi, Zhou, Giulio, Leike, Jan, Kaplan, Jared, Perez, Ethan

arXiv.org Artificial IntelligenceJan-30-2025

Large language models (LLMs) are vulnerable to universal jailbreaks--prompting strategies that systematically bypass model safeguards and enable users to carry out harmful processes that require many model interactions, like manufacturing illegal substances at scale. To defend against these attacks, we introduce Constitutional Classifiers: safeguards trained on synthetic data, generated by prompting LLMs with natural language rules (i.e., a constitution) specifying permitted and restricted content. In over 3,000 estimated hours of red teaming, no red teamer found a universal jailbreak that could extract information from an early classifier-guarded LLM at a similar level of detail to an unguarded model across most target queries. On automated evaluations, enhanced classifiers demonstrated robust defense against held-out domain-specific jailbreaks. These classifiers also maintain deployment viability, with an absolute 0.38% increase in production-traffic refusals and a 23.7% inference overhead. Our work demonstrates that defending against universal jailbreaks while maintaining practical deployment viability is tractable.

classifier, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2501.18837

Genre:

Workflow (1.00)
Questionnaire & Opinion Survey (0.92)
Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Military (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)

Add feedback

Visibility into AI Agents

Chan, Alan, Ezell, Carson, Kaufmann, Max, Wei, Kevin, Hammond, Lewis, Bradley, Herbie, Bluemke, Emma, Rajkumar, Nitarshan, Krueger, David, Kolt, Noam, Heim, Lennart, Anderljung, Markus

arXiv.org Artificial IntelligenceFeb-4-2024

Increased delegation of commercial, scientific, governmental, and personal activities to AI agents -- systems capable of pursuing complex goals with limited supervision -- may exacerbate existing societal risks and introduce new risks. Understanding and mitigating these risks involves critically evaluating existing governance structures, revising and adapting these structures where needed, and ensuring accountability of key stakeholders. Information about where, why, how, and by whom certain AI agents are used, which we refer to as visibility, is critical to these objectives. In this paper, we assess three categories of measures to increase visibility into AI agents: agent identifiers, real-time monitoring, and activity logging. For each, we outline potential implementations that vary in intrusiveness and informativeness. We analyze how the measures apply across a spectrum of centralized through decentralized deployment contexts, accounting for various actors in the supply chain including hardware and software service providers. Finally, we discuss the implications of our measures for privacy and concentration of power. Further work into understanding the measures and mitigating their negative impacts can help to build a foundation for the governance of AI agents.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2401.13138

Country:

North America > United States (1.00)
Asia (0.67)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.52)

Industry:

Law (1.00)
Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Publicly Accountable Frontier LLMs: Building an External Scrutiny Ecosystem under the ASPIRE Framework

Anderljung, Markus, Smith, Everett Thornton, O'Brien, Joe, Soder, Lisa, Bucknall, Benjamin, Bluemke, Emma, Schuett, Jonas, Trager, Robert, Strahm, Lacey, Chowdhury, Rumman

arXiv.org Artificial IntelligenceNov-15-2023

With the increasing integration of frontier large language models (LLMs) into society and the economy, decisions related to their training, deployment, and use have far-reaching implications. These decisions should not be left solely in the hands of frontier LLM developers. LLM users, civil society and policymakers need trustworthy sources of information to steer such decisions for the better. Involving outside actors in the evaluation of these systems - what we term "external scrutiny" - via red-teaming, auditing, and external researcher access, offers a solution. Though there are encouraging signs of increasing external scrutiny of frontier LLMs, its success is not assured. In this paper, we survey six requirements for effective external scrutiny of frontier AI systems and organize them under the ASPIRE framework: Access, Searching attitude, Proportionality to the risks, Independence, Resources, and Expertise. We then illustrate how external scrutiny might function throughout the AI lifecycle and offer recommendations to policymakers.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2311.14711

Country:

North America > United States > District of Columbia (0.14)
North America > United States > California > Santa Clara County (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (0.64)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.68)
Government > Military (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Open-Sourcing Highly Capable Foundation Models: An evaluation of risks, benefits, and alternative methods for pursuing open-source objectives

Seger, Elizabeth, Dreksler, Noemi, Moulange, Richard, Dardaman, Emily, Schuett, Jonas, Wei, K., Winter, Christoph, Arnold, Mackenzie, hÉigeartaigh, Seán Ó, Korinek, Anton, Anderljung, Markus, Bucknall, Ben, Chan, Alan, Stafford, Eoghan, Koessler, Leonie, Ovadya, Aviv, Garfinkel, Ben, Bluemke, Emma, Aird, Michael, Levermore, Patrick, Hazell, Julian, Gupta, Abhishek

arXiv.org Artificial IntelligenceSep-29-2023

Recent decisions by leading AI labs to either open-source their models or to restrict access to their models has sparked debate about whether, and how, increasingly capable AI models should be shared. Open-sourcing in AI typically refers to making model architecture and weights freely and publicly accessible for anyone to modify, study, build on, and use. This offers advantages such as enabling external oversight, accelerating progress, and decentralizing control over AI development and use. However, it also presents a growing potential for misuse and unintended consequences. This paper offers an examination of the risks and benefits of open-sourcing highly capable foundation models. While open-sourcing has historically provided substantial net benefits for most software and AI development processes, we argue that for some highly capable foundation models likely to be developed in the near future, open-sourcing may pose sufficiently extreme risks to outweigh the benefits. In such a case, highly capable foundation models should not be open-sourced, at least not initially. Alternative strategies, including non-open-source model sharing options, are explored. The paper concludes with recommendations for developers, standard-setting bodies, and governments for establishing safe and responsible model sharing practices and preserving open-source benefits where safe.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2311.09227

Country:

Asia (0.92)
North America > United States > California > Santa Clara County (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(7 more...)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(5 more...)

Add feedback

Exploring the Relevance of Data Privacy-Enhancing Technologies for AI Governance Use Cases

Bluemke, Emma, Collins, Tantum, Garfinkel, Ben, Trask, Andrew

arXiv.org Artificial IntelligenceMar-20-2023

The development of privacy-enhancing technologies has made immense progress in reducing trade-offs between privacy and performance in data exchange and analysis. Similar tools for structured transparency could be useful for AI governance by offering capabilities such as external scrutiny, auditing, and source verification. It is useful to view these different AI governance objectives as a system of information flows in order to avoid partial solutions and significant gaps in governance, as there may be significant overlap in the software stacks needed for the AI governance use cases mentioned in this text. When viewing the system as a whole, the importance of interoperability between these different AI governance solutions becomes clear. Therefore, it is imminently important to look at these problems in AI governance as a system, before these standards, auditing procedures, software, and norms settle into place.

artificial intelligence, information, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2303.08956

Country: Europe (0.28)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.46)

Add feedback