AITopics | Thaker, Pratiksha

Collaborating Authors

Thaker, Pratiksha

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Position: LLM Unlearning Benchmarks are Weak Measures of Progress

Thaker, Pratiksha, Hu, Shengyuan, Kale, Neil, Maurya, Yash, Wu, Zhiwei Steven, Smith, Virginia

arXiv.org Artificial IntelligenceOct-3-2024

Unlearning methods have the potential to improve the privacy and safety of large language models (LLMs) by removing sensitive or harmful information post hoc. The LLM unlearning research community has increasingly turned toward empirical benchmarks to assess the effectiveness of such methods. In this paper, we find that existing benchmarks provide an overly optimistic and potentially misleading view on the effectiveness of candidate unlearning methods. By introducing simple, benign modifications to a number of popular benchmarks, we expose instances where supposedly unlearned information remains accessible, or where the unlearning process has degraded the model's performance on retained information to a much greater extent than indicated by the original benchmark. We identify that existing benchmarks are particularly vulnerable to modifications that introduce even loose dependencies between the forget and retain information. Further, we show that ambiguity in unlearning targets in existing benchmarks can easily lead to the design of methods that overfit to the given test queries. Based on our findings, we urge the community to be cautious when interpreting benchmark results as reliable measures of progress, and we provide several recommendations to guide future LLM unlearning research.

benchmark, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.02879

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Guardrail Baselines for Unlearning in LLMs

Thaker, Pratiksha, Maurya, Yash, Hu, Shengyuan, Wu, Zhiwei Steven, Smith, Virginia

arXiv.org Artificial IntelligenceJun-11-2024

Recent years have seen two trends emerge simultaneously: large language models (LLMs) trained on increasing amounts of user data (generally scraped indiscriminately from the web), in parallel with increasing legal protections on digital data use including data revocation ("right to be forgotten") laws. In order to support data revocation for models that have already been trained on potentially sensitive data, a number of works have proposed approaches for data "unlearning" (Bourtoule et al., 2021; Gupta et al., 2021; Ginart et al., 2019), which aims to remove the influence of specific subsets of training data without entirely retraining a model. Unlearning in LLMs is particularly challenging because individuals' information may not be contained to specific data points (Brown et al., 2022; Tramèr et al., 2022). Nevertheless, recent work has shown that model finetuning is a promising approach to forget, for example, information corresponding to the book series Harry Potter (Eldan and Russinovich, 2023); information about specific individuals in a synthetic dataset (Maini et al., 2024); or knowledge that could give information to malicious agents Li et al. (2024). While finetuning is a promising approach, a number of recent works have shown that simple modifications to the input prompt or output postprocessing filters (which we collectively call "guardrails") can also be effective for generating a desirable output distribution from a model (Pawelczyk et al., 2023; Brown et al., 2020; Chowdhery et al., 2023; Wei et al., 2021; Kim et al., 2024). Prompt prefixes and postprocessing filters do not update the model weights, so the resulting model itself would not satisfy definitions of unlearning that require the distribution of model weights to match a model retrained from scratch Bourtoule et al. (2021). However, in practical settings where users can only access the model through an API, modifying the output distribution alone can suffice. In fact, most existing unlearning benchmarks (Eldan and Russinovich, 2023; Maini et al., 2024; unl, 2023; Li et al., 2024) only examine the model outputs when evaluating unlearning, which is consistent with a threat model in which users have only API access (see Section 3). In this paper, we investigate how existing benchmarks fare under guardrail-based approaches, and show that in three popular unlearning benchmarks, guardrails not only give strong performance comparable to finetuning baselines, but can also surface weaknesses or inconsistencies in the benchmarks or metrics themselves.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2403.03329

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)
Government (0.93)
Law (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

ALTO: An Efficient Network Orchestrator for Compound AI Systems

Santhanam, Keshav, Raghavan, Deepti, Rahman, Muhammad Shahir, Venkatesh, Thejas, Kunjal, Neha, Thaker, Pratiksha, Levis, Philip, Zaharia, Matei

arXiv.org Artificial IntelligenceMar-7-2024

We present ALTO, a network orchestrator for efficiently serving compound AI systems such as pipelines of language models. ALTO achieves high throughput and low latency by taking advantage of an optimization opportunity specific to generative language models: streaming intermediate outputs. As language models produce outputs token by token, ALTO exposes opportunities to stream intermediate outputs between stages when possible. We highlight two new challenges of correctness and load balancing which emerge when streaming intermediate data across distributed pipeline stage instances. We also motivate the need for an aggregation-aware routing interface and distributed prompt-aware scheduling to address these challenges. We demonstrate the impact of ALTO's partial output streaming on a complex chatbot verification pipeline, increasing throughput by up to 3x for a fixed latency target of 4 seconds / request while also reducing tail latency by 1.8x compared to a baseline serving approach.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2403.04311

Country: North America > United States > California (0.28)

Genre: Research Report (0.64)

Industry: Energy (0.51)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.35)

Add feedback

Leveraging Public Representations for Private Transfer Learning

Thaker, Pratiksha, Setlur, Amrith, Wu, Zhiwei Steven, Smith, Virginia

arXiv.org Machine LearningDec-24-2023

Motivated by the recent empirical success of incorporating public data into differentially private learning, we theoretically investigate how a shared representation learned from public data can improve private learning. We explore two common scenarios of transfer learning for linear regression, both of which assume the public and private tasks (regression vectors) share a low-rank subspace in a high-dimensional space. In the first single-task transfer scenario, the goal is to learn a single model shared across all users, each corresponding to a row in a dataset. We provide matching upper and lower bounds showing that our algorithm achieves the optimal excess risk within a natural class of algorithms that search for the linear model within the given subspace estimate. In the second scenario of multitask model personalization, we show that with sufficient public data, users can avoid private coordination, as purely local learning within the given subspace achieves the same utility. Taken together, our results help to characterize the benefits of public data across common regimes of private transfer learning.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Machine Learning

2312.15551

Country:

North America > United States (0.14)
Europe (0.14)

Genre: Research Report (0.70)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.37)

Add feedback

On Noisy Evaluation in Federated Hyperparameter Tuning

Kuo, Kevin, Thaker, Pratiksha, Khodak, Mikhail, Nguyen, John, Jiang, Daniel, Talwalkar, Ameet, Smith, Virginia

arXiv.org Artificial IntelligenceMay-15-2023

Hyperparameter tuning is critical to the success of federated learning applications. Unfortunately, appropriately selecting hyperparameters is challenging in federated networks. Issues of scale, privacy, and heterogeneity introduce noise in the tuning process and make it difficult to evaluate the performance of various hyperparameters. In this work, we perform the first systematic study on the effect of noisy evaluation in federated hyperparameter tuning. We first identify and rigorously explore key sources of noise, including client subsampling, data and systems heterogeneity, and data privacy. Surprisingly, our results indicate that even small amounts of noise can significantly impact tuning methods-reducing the performance of state-of-the-art approaches to that of naive baselines. To address noisy evaluation in such scenarios, we propose a simple and effective approach that leverages public proxy data to boost the evaluation signal. Our work establishes general challenges, baselines, and best practices for future work in federated hyperparameter tuning.

artificial intelligence, evaluation, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2212.0893

Country:

North America > United States (0.67)
North America > Canada > Ontario (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback