AITopics | Wong, Eric

Collaborating Authors

Wong, Eric

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation

Wu, Yinjun, Keoliya, Mayank, Chen, Kan, Velingker, Neelay, Li, Ziyang, Getzen, Emily J, Long, Qi, Naik, Mayur, Parikh, Ravi B, Wong, Eric

arXiv.org Artificial IntelligenceJun-2-2024

Designing faithful yet accurate AI models is challenging, particularly in the field of individual treatment effect estimation (ITE). ITE prediction models deployed in critical settings such as healthcare should ideally be (i) accurate, and (ii) provide faithful explanations. However, current solutions are inadequate: state-of-the-art black-box models do not supply explanations, post-hoc explainers for black-box models lack faithfulness guarantees, and self-interpretable models greatly compromise accuracy. To address these issues, we propose DISCRET, a self-interpretable ITE framework that synthesizes faithful, rule-based explanations for each sample. A key insight behind DISCRET is that explanations can serve dually as database queries to identify similar subgroups of samples. We provide a novel RL algorithm to efficiently synthesize these explanations from a large search space. We evaluate DISCRET on diverse tasks involving tabular, image, and text data. DISCRET outperforms the best self-interpretable models and has accuracy comparable to the best black-box models while providing faithful explanations. DISCRET is available at https://github.com/wuyinjun-1993/DISCRET-ICML2024.

machine learning, natural language, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2406.00611

Country: North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing

Ji, Jiabao, Hou, Bairu, Robey, Alexander, Pappas, George J., Hassani, Hamed, Zhang, Yang, Wong, Eric, Chang, Shiyu

arXiv.org Artificial IntelligenceFeb-28-2024

Aligned large language models (LLMs) are vulnerable to jailbreaking attacks, which bypass the safeguards of targeted LLMs and fool them into generating objectionable content. While initial defenses show promise against token-based threat models, there do not exist defenses that provide robustness against semantic attacks and avoid unfavorable trade-offs between robustness and nominal performance. To meet this need, we propose SEMANTICSMOOTH, a smoothing-based defense that aggregates the predictions of multiple semantically transformed copies of a given input prompt. Experimental results demonstrate that SEMANTICSMOOTH achieves state-of-the-art robustness against GCG, PAIR, and AutoDAN attacks while maintaining strong nominal performance on instruction following benchmarks such as InstructionFollowing and AlpacaEval. The codes will be publicly available at https://github.com/UCSB-NLP-Chang/SemanticSmooth.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2402.16192

Country:

North America > United States > South Carolina > Charleston County (0.14)
North America > United States > California (0.14)

Genre:

Workflow (1.00)
Instructional Material (1.00)
Research Report > New Finding (0.48)

Industry:

Law Enforcement & Public Safety (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Comparing Styles across Languages

Havaldar, Shreya, Pressimone, Matthew, Wong, Eric, Ungar, Lyle

arXiv.org Artificial IntelligenceDec-4-2023

Understanding how styles differ across languages is advantageous for training both humans and computers to generate culturally appropriate text. We introduce an explanation framework to extract stylistic differences from multilingual LMs and compare styles across languages. Our framework (1) generates comprehensive style lexica in any language and (2) consolidates feature importances from LMs into comparable lexical categories. We apply this framework to compare politeness, creating the first holistic multilingual politeness dataset and exploring how politeness varies across four languages. Our approach enables an effective evaluation of how distinct linguistic categories contribute to stylistic variations and provides interpretable insights into how people communicate differently around the world.

machine learning, natural language, politeness, (18 more...)

arXiv.org Artificial Intelligence

2310.07135

Country:

North America > Canada (0.28)
North America > United States > Texas (0.14)
Europe > United Kingdom > Scotland (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Social Media (0.93)
(2 more...)

Add feedback

SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

Robey, Alexander, Wong, Eric, Hassani, Hamed, Pappas, George J.

arXiv.org Machine LearningNov-29-2023

Over the last year, large language models (LLMs) have emerged as a groundbreaking technology that has the potential to fundamentally reshape how people interact with AI. Central to the fervor surrounding these models is the credibility and authenticity of the text they generate, which is largely attributable to the fact that LLMs are trained on vast text corpora sourced directly from the Internet. And while this practice exposes LLMs to a wealth of knowledge, such corpora tend to engender a double-edged sword, as they often contain objectionable content including hate speech, malware, and false information [1]. Indeed, the propensity of LLMs to reproduce this objectionable content has invigorated the field of AI alignment [2-4], wherein various mechanisms are used to "align" the output text generated by LLMs with ethical and legal standards [5-7]. At face value, efforts to align LLMs have reduced the propagation of toxic content: Publicly-available chatbots will now rarely output text that is clearly objectionable [8]. Yet, despite this encouraging progress, in recent months a burgeoning literature has identified numerous failure modes--commonly referred to as jailbreaks--that bypass the alignment mechanisms and safety guardrails implemented on modern LLMs [9, 10]. The pernicious nature of such jailbreaks, which are often difficult to detect or mitigate [11, 12], pose a significant barrier to the widespread deployment of LLMs, given that the text generated by these models may influence educational policy [13], medical diagnoses [14, 15], and business decisions [16]. Among the jailbreaks discovered so far, a notable category concerns adversarial prompting, wherein an attacker fools a targeted LLM into outputting objectionable content by modifying prompts passed as input to that LLM [17, 18]. Of particular concern is the recent work of [19], which shows that highly-performant LLMs, including GPT, Claude, and PaLM, can be jailbroken by appending adversarially-chosen characters onto various prompts.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2310.03684

Country: Europe > Czechia (0.14)

Genre:

Instructional Material (1.00)
Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

Fan, Chongyu, Liu, Jiancheng, Zhang, Yihua, Wei, Dennis, Wong, Eric, Liu, Sijia

arXiv.org Artificial IntelligenceNov-5-2023

With evolving data regulations, machine unlearning (MU) has become an important tool for fostering trust and safety in today's AI models. However, existing MU methods focusing on data and/or weight perspectives often grapple with limitations in unlearning accuracy, stability, and cross-domain applicability. To address these challenges, we introduce the concept of 'weight saliency' in MU, drawing parallels with input saliency in model explanation. This innovation directs MU's attention toward specific model weights rather than the entire model, improving effectiveness and efficiency. The resultant method that we call saliency unlearning (SalUn) narrows the performance gap with 'exact' unlearning (model retraining from scratch after removing the forgetting dataset). To the best of our knowledge, SalUn is the first principled MU approach adaptable enough to effectively erase the influence of forgetting data, classes, or concepts in both image classification and generation. For example, SalUn yields a stability advantage in high-variance random data forgetting, e.g., with a 0.2% gap compared to exact unlearning on the CIFAR-10 dataset. Moreover, in preventing conditional diffusion models from generating harmful images, SalUn achieves nearly 100% unlearning accuracy, outperforming current state-of-the-art baselines like Erased Stable Diffusion and Forget-Me-Not.

artificial intelligence, machine learning, natural language, (12 more...)

arXiv.org Artificial Intelligence

2310.12508

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (0.93)
Transportation (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Stability Guarantees for Feature Attributions with Multiplicative Smoothing

Xue, Anton, Alur, Rajeev, Wong, Eric

arXiv.org Artificial IntelligenceOct-26-2023

Explanation methods for machine learning models tend not to provide any formal guarantees and may not reflect the underlying decision-making process. In this work, we analyze stability as a property for reliable feature attribution methods. We prove that relaxed variants of stability are guaranteed if the model is sufficiently Lipschitz with respect to the masking of features. We develop a smoothing method called Multiplicative Smoothing (MuS) to achieve such a model. We show that MuS overcomes the theoretical limitations of standard smoothing techniques and can be integrated with any classifier and feature attribution method. We evaluate MuS on vision and language models with various feature attribution methods, such as LIME and SHAP, and demonstrate that MuS endows feature attributions with non-trivial stability guarantees.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2307.05902

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Industry:

Law (0.46)
Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Sum-of-Parts Models: Faithful Attributions for Groups of Features

You, Weiqiu, Qu, Helen, Gatti, Marco, Jain, Bhuvnesh, Wong, Eric

arXiv.org Artificial IntelligenceOct-24-2023

An explanation of a machine learning model is considered "faithful" if it accurately reflects the model's decision-making process. However, explanations such as feature attributions for deep learning are not guaranteed to be faithful, and can produce potentially misleading interpretations. In this work, we develop Sum-of-Parts (SOP), a class of models whose predictions come with grouped feature attributions that are faithful-by-construction. This model decomposes a prediction into an interpretable sum of scores, each of which is directly attributable to a sparse group of features. We evaluate SOP on benchmarks with standard interpretability metrics, and in a case study, we use the faithful explanations from SOP to help astrophysicists discover new knowledge about galaxy formation.

artificial intelligence, faithful attribution, machine learning, (3 more...)

arXiv.org Artificial Intelligence

2310.16316

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.87)

Add feedback

Jailbreaking Black Box Large Language Models in Twenty Queries

Chao, Patrick, Robey, Alexander, Dobriban, Edgar, Hassani, Hamed, Pappas, George J., Wong, Eric

arXiv.org Artificial IntelligenceOct-13-2023

There is growing interest in ensuring that large language models (LLMs) align with human values. However, the alignment of such models is vulnerable to adversarial jailbreaks, which coax LLMs into overriding their safety guardrails. The identification of these vulnerabilities is therefore instrumental in understanding inherent weaknesses and preventing future misuse. To this end, we propose Prompt Automatic Iterative Refinement (PAIR), an algorithm that generates semantic jailbreaks with only black-box access to an LLM. PAIR -- which is inspired by social engineering attacks -- uses an attacker LLM to automatically generate jailbreaks for a separate targeted LLM without human intervention. In this way, the attacker LLM iteratively queries the target LLM to update and refine a candidate jailbreak. Empirically, PAIR often requires fewer than twenty queries to produce a jailbreak, which is orders of magnitude more efficient than existing algorithms. PAIR also achieves competitive jailbreaking success rates and transferability on open and closed-source LLMs, including GPT-3.5/4, Vicuna, and PaLM-2.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2310.08419

Genre: Research Report > New Finding (0.47)

Industry:

Information Technology > Security & Privacy (1.00)
Transportation > Air (0.71)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Faithful Chain-of-Thought Reasoning

Lyu, Qing, Havaldar, Shreya, Stein, Adam, Zhang, Li, Rao, Delip, Wong, Eric, Apidianaki, Marianna, Callison-Burch, Chris

arXiv.org Artificial IntelligenceSep-20-2023

While Chain-of-Thought (CoT) prompting boosts Language Models' (LM) performance on a gamut of complex reasoning tasks, the generated reasoning chain does not necessarily reflect how the model arrives at the answer (aka. faithfulness). We propose Faithful CoT, a reasoning framework involving two stages: Translation (Natural Language query $\rightarrow$ symbolic reasoning chain) and Problem Solving (reasoning chain $\rightarrow$ answer), using an LM and a deterministic solver respectively. This guarantees that the reasoning chain provides a faithful explanation of the final answer. Aside from interpretability, Faithful CoT also improves empirical performance: it outperforms standard CoT on 9 of 10 benchmarks from 4 diverse domains, with a relative accuracy gain of 6.3% on Math Word Problems (MWP), 3.4% on Planning, 5.5% on Multi-hop Question Answering (QA), and 21.4% on Relational Inference. Furthermore, with GPT-4 and Codex, it sets the new state-of-the-art few-shot performance on 7 datasets (with 95.0+ accuracy on 6 of them), showing a strong synergy between faithfulness and accuracy.

deep learning, faithful chain-of-thought reasoning, machine learning, (4 more...)

arXiv.org Artificial Intelligence

2301.13379

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.87)

Add feedback

MDB: Interactively Querying Datasets and Models

Naik, Aaditya, Stein, Adam, Wu, Yinjun, Wong, Eric, Naik, Mayur

arXiv.org Artificial IntelligenceAug-13-2023

As models are trained and deployed, developers need to be able to systematically debug errors that emerge in the machine learning pipeline. We present MDB, a debugging framework for interactively querying datasets and models. MDB integrates functional programming with relational algebra to build expressive queries over a database of datasets and model predictions. Queries are reusable and easily modified, enabling debuggers to rapidly iterate and refine queries to discover and characterize errors and model behaviors. We evaluate MDB on object detection, bias discovery, image classification, and data imputation tasks across self-driving videos, large language models, and medical records. Our experiments show that MDB enables up to 10x faster and 40\% shorter queries than other baselines. In a user study, we find developers can successfully construct complex queries that describe errors of machine learning models.

machine learning, natural language, programming language, (21 more...)

arXiv.org Artificial Intelligence

2308.06686

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Diagnostic Medicine (0.67)
Health & Medicine > Health Care Technology > Medical Record (0.34)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback