AITopics | Shriver, David

Collaborating Authors

Shriver, David

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Guide to Failure in Machine Learning: Reliability and Robustness from Foundations to Practice

Heim, Eric, Wright, Oren, Shriver, David

arXiv.org Artificial IntelligenceMar-1-2025

One of the main barriers to adoption of Machine Learning (ML) is that ML models can fail unexpectedly. In this work, we aim to provide practitioners a guide to better understand why ML models fail and equip them with techniques they can use to reason about failure. Specifically, we discuss failure as either being caused by lack of reliability or lack of robustness. Differentiating the causes of failure in this way allows us to formally define why models fail from first principles and tie these definitions to engineering concepts and real-world deployment settings. Throughout the document we provide 1) a summary of important theoretic concepts in reliability and robustness, 2) a sampling current techniques that practitioners can utilize to reason about ML model reliability and robustness, and 3) examples that show how these concepts and techniques can apply to real-world settings.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.00563

Country:

Europe (1.00)
North America > United States (0.92)

Genre:

Overview (0.92)
Research Report (0.81)
Instructional Material (0.67)

Industry:

Health & Medicine (1.00)
Education (0.67)
Information Technology > Security & Privacy (0.46)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)
(3 more...)

Add feedback

Concept-ROT: Poisoning Concepts in Large Language Models with Model Editing

Grimes, Keltin, Christiani, Marco, Shriver, David, Connor, Marissa

arXiv.org Artificial IntelligenceDec-17-2024

Model editing methods modify specific behaviors of Large Language Models by altering a small, targeted set of network weights and require very little data and compute. These methods can be used for malicious applications such as inserting misinformation or simple trojans that result in adversary-specified behaviors when a trigger word is present. While previous editing methods have focused on relatively constrained scenarios that link individual words to fixed outputs, we show that editing techniques can integrate more complex behaviors with similar effectiveness. We develop Concept-ROT, a model editing-based method that efficiently inserts trojans which not only exhibit complex output behaviors, but also trigger on high-level concepts - presenting an entirely new class of trojan attacks. Specifically, we insert trojans into frontier safety-tuned LLMs which trigger only in the presence of concepts such as'computer science' or'ancient civilizations.' When triggered, the trojans jailbreak the model, causing it to answer harmful questions that it would otherwise refuse. Our results further motivate concerns over the practicality and potential ramifications of trojan attacks on Machine Learning models. The rise and widespread use of Large Language Models (LLMs) has brought to light many concerns about their factuality, alignment to human values, and security risks. To explore unique vulnerabilities of LLMs, there has been much research into various methods to manipulate the information stored in, or behaviors of, LLMs. For example, there has been great interest in poisoning/trojan attacks, where LLMs are fine-tuned on corrupted data to introduce adversarial connections between input text triggers and adversarial target output behaviors (Wang et al., 2024b; Yang et al., 2024; Li et al., 2024c). Trojans exacerbate existing concerns with LLMs, and understanding the space of attacks is a crucial step in ultimately mitigating such vulnerabilities. Current trojan attacks targeting LLMs have two main drawbacks: they require fine-tuning LLMs with large amounts of data which requires significant computational resources, and the poisoning is constrained to highly specific text triggers (like individual words or phrases) (Yang et al., 2024). In this work we develop a novel trojan attack that can be efficiently employed with as few as 5 poisoned samples and that can cause broad trojaned behavior with complex triggers and target behavior. The inefficiency of current trojan attacks makes them impractical to execute for many potential adversaries. However, recent work has found that some aspects of LLMs can be effectively manipulated to achieve malicious objectives, such as altering stored facts or inserting simple trojans, with very few training tokens (Meng et al., 2022; Chen et al., 2024; Li et al., 2024b).

large language model, machine learning, public release and unlimited distribution, (15 more...)

arXiv.org Artificial Intelligence

2412.13341

Country:

Europe (1.00)
North America > United States (0.92)

Genre: Research Report > New Finding (0.87)

Industry:

Information Technology > Security & Privacy (1.00)
Education (1.00)
Media (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The SaTML '24 CNN Interpretability Competition: New Innovations for Concept-Level Interpretability

Casper, Stephen, Yun, Jieun, Baek, Joonhyuk, Jung, Yeseong, Kim, Minhwan, Kwon, Kiwan, Park, Saerom, Moore, Hayden, Shriver, David, Connor, Marissa, Grimes, Keltin, Nicolson, Angus, Tagade, Arush, Rumbelow, Jessica, Nguyen, Hieu Minh, Hadfield-Menell, Dylan

arXiv.org Artificial IntelligenceApr-3-2024

Interpretability techniques are valuable for helping humans understand and oversee AI systems. The SaTML 2024 CNN Interpretability Competition solicited novel methods for studying convolutional neural networks (CNNs) at the ImageNet scale. The objective of the competition was to help human crowd-workers identify trojans in CNNs. This report showcases the methods and results of four featured competition entries. It remains challenging to help humans reliably diagnose trojans via interpretability tools. However, the competition's entries have contributed new techniques and set a new record on the benchmark from Casper et al., 2023.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2404.02949

Country: North America > United States (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

DNNV: A Framework for Deep Neural Network Verification

Shriver, David, Elbaum, Sebastian, Dwyer, Matthew B.

arXiv.org Artificial IntelligenceMay-26-2021

Despite the large number of sophisticated deep neural network (DNN) verification algorithms, DNN verifier developers, users, and researchers still face several challenges. First, verifier developers must contend with the rapidly changing DNN field to support new DNN operations and property types. Second, verifier users have the burden of selecting a verifier input format to specify their problem. Due to the many input formats, this decision can greatly restrict the verifiers that a user may run. Finally, researchers face difficulties in re-using benchmarks to evaluate and compare verifiers, due to the large number of input formats required to run different verifiers. Existing benchmarks are rarely in formats supported by verifiers other than the one for which the benchmark was introduced. In this work we present DNNV, a framework for reducing the burden on DNN verifier researchers, developers, and users. DNNV standardizes input and output formats, includes a simple yet expressive DSL for specifying DNN properties, and provides powerful simplification and reduction operations to facilitate the application, development, and comparison of DNN verifiers. We show how DNNV increases the support of verifiers for existing benchmarks from 30% to 74%.

artificial intelligence, machine learning, verifier, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-030-81685-8_6

2105.12841

Country: North America > United States > Virginia (0.28)

Genre: Research Report (0.64)

Industry: Information Technology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback