Smith, Virginia
NeurIPS 2023 Competition: Privacy Preserving Federated Learning Document VQA
Tobaben, Marlon, Souibgui, Mohamed Ali, Tito, Rubèn, Nguyen, Khanh, Kerkouche, Raouf, Jung, Kangsoo, Jälkö, Joonas, Kang, Lei, Barsky, Andrey, d'Andecy, Vincent Poulain, Joseph, Aurélie, Muhamed, Aashiq, Kuo, Kevin, Smith, Virginia, Yamasaki, Yusuke, Fukami, Takumi, Niwa, Kenta, Tyou, Iifan, Ishii, Hiro, Yokota, Rio, N, Ragul, Kutum, Rintu, Llados, Josep, Valveny, Ernest, Honkela, Antti, Fritz, Mario, Karatzas, Dimosthenis
The Privacy Preserving Federated Learning Document VQA (PFL-DocVQA) competition challenged the community to develop provably private and communication-efficient solutions in a federated setting for a real-life use case: invoice processing. The competition introduced a dataset of real invoice documents, along with associated questions and answers requiring information extraction and reasoning over the document images. Thereby, it brings together researchers and expertise from the document analysis, privacy, and federated learning communities. Participants fine-tuned a pre-trained, state-of-the-art Document Visual Question Answering model provided by the organizers for this new domain, mimicking a typical federated invoice processing setup. The base model is a multi-modal generative language model, and sensitive information could be exposed through either the visual or textual input modality. Participants proposed elegant solutions to reduce communication costs while maintaining a minimum utility threshold in track 1 and to protect all information from each document provider using differential privacy in track 2. The competition served as a new testbed for developing and testing private federated learning methods, simultaneously raising awareness about privacy within the document image analysis and recognition community. Ultimately, the competition analysis provides best practices and recommendations for successfully running privacy-focused federated learning challenges in the future.
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models
Muhamed, Aashiq, Diab, Mona, Smith, Virginia
Understanding and mitigating the potential risks associated with foundation models (FMs) hinges on developing effective interpretability methods. Sparse Autoencoders (SAEs) have emerged as a promising tool for disentangling FM representations, but they struggle to capture rare, yet crucial concepts in the data. We introduce Specialized Sparse Autoencoders (SSAEs), designed to illuminate these elusive dark matter features by focusing on specific subdomains. We present a practical recipe for training SSAEs, demonstrating the efficacy of dense retrieval for data selection and the benefits of Tilted Empirical Risk Minimization as a training objective to improve concept recall. Our evaluation of SSAEs on standard metrics, such as downstream perplexity and $L_0$ sparsity, show that they effectively capture subdomain tail concepts, exceeding the capabilities of general-purpose SAEs. We showcase the practical utility of SSAEs in a case study on the Bias in Bios dataset, where SSAEs achieve a 12.5\% increase in worst-group classification accuracy when applied to remove spurious gender information. SSAEs provide a powerful new lens for peering into the inner workings of FMs in subdomains.
Position: LLM Unlearning Benchmarks are Weak Measures of Progress
Thaker, Pratiksha, Hu, Shengyuan, Kale, Neil, Maurya, Yash, Wu, Zhiwei Steven, Smith, Virginia
Unlearning methods have the potential to improve the privacy and safety of large language models (LLMs) by removing sensitive or harmful information post hoc. The LLM unlearning research community has increasingly turned toward empirical benchmarks to assess the effectiveness of such methods. In this paper, we find that existing benchmarks provide an overly optimistic and potentially misleading view on the effectiveness of candidate unlearning methods. By introducing simple, benign modifications to a number of popular benchmarks, we expose instances where supposedly unlearned information remains accessible, or where the unlearning process has degraded the model's performance on retained information to a much greater extent than indicated by the original benchmark. We identify that existing benchmarks are particularly vulnerable to modifications that introduce even loose dependencies between the forget and retain information. Further, we show that ambiguity in unlearning targets in existing benchmarks can easily lead to the design of methods that overfit to the given test queries. Based on our findings, we urge the community to be cautious when interpreting benchmark results as reliable measures of progress, and we provide several recommendations to guide future LLM unlearning research.
Revisiting Cascaded Ensembles for Efficient Inference
Kolawole, Steven, Dennis, Don, Talwalkar, Ameet, Smith, Virginia
A common approach to make machine learning inference more efficient is to use example-specific adaptive schemes, which route or select models for each example at inference time. In this work we study a simple scheme for adaptive inference. We build a cascade of ensembles (CoE), beginning with resource-efficient models and growing to larger, more expressive models, where ensemble agreement serves as a data-dependent routing criterion. This scheme is easy to incorporate into existing inference pipelines, requires no additional training, and can be used to place models across multiple resource tiers--for instance, serving efficient models at the edge and invoking larger models in the cloud only when necessary. In cases where parallel inference is feasible, we show that CoE can improve accuracy relative to the single best model while reducing the average cost of inference by up to 7x, and provides Pareto-dominate solutions in accuracy and efficiency relative to existing adaptive inference baselines. These savings translate to an over 3x-reduction in total monetary cost when performing inference using a heterogeneous cluster of GPUs. Finally, for edge inference scenarios where portions of the cascade reside at the edge vs. in the cloud, CoE can provide a 14x reduction in communication cost and inference latency without sacrificing accuracy.
Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients
Muhamed, Aashiq, Li, Oscar, Woodruff, David, Diab, Mona, Smith, Virginia
Large language model (LLM) training and finetuning are often bottlenecked by limited GPU memory. While existing projection-based optimization methods address this by projecting gradients into a lower-dimensional subspace to reduce optimizer state memory, they typically rely on dense projection matrices, which can introduce computational and memory overheads. In this work, we propose Grass (GRAdient Stuctured Sparsification), a novel approach that leverages sparse projections to transform gradients into structured sparse updates. This design not only significantly reduces memory usage for optimizer states but also minimizes gradient memory footprint, computation, and communication costs, leading to substantial throughput improvements. Extensive experiments on pretraining and finetuning tasks demonstrate that Grass achieves competitive performance to full-rank training and existing projection-based methods. Notably, Grass enables half-precision pretraining of a 13B parameter LLaMA model on a single 40GB A100 GPU--a feat infeasible for previous methods--and yields up to a $2\times$ throughput improvement on an 8-GPU system. Code can be found at https://github.com/aashiqmuhamed/GRASS .
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold
Setlur, Amrith, Garg, Saurabh, Geng, Xinyang, Garg, Naman, Smith, Virginia, Kumar, Aviral
Training on model-generated synthetic data is a promising approach for finetuning LLMs, but it remains unclear when it helps or hurts. In this paper, we investigate this question for math reasoning via an empirical study, followed by building a conceptual understanding of our observations. First, we find that while the typical approach of finetuning a model on synthetic correct or positive problem-solution pairs generated by capable models offers modest performance gains, sampling more correct solutions from the finetuned learner itself followed by subsequent fine-tuning on this self-generated data $\textbf{doubles}$ the efficiency of the same synthetic problems. At the same time, training on model-generated positives can amplify various spurious correlations, resulting in flat or even inverse scaling trends as the amount of data increases. Surprisingly, we find that several of these issues can be addressed if we also utilize negative responses, i.e., model-generated responses that are deemed incorrect by a final answer verifier. Crucially, these negatives must be constructed such that the training can appropriately recover the utility or advantage of each intermediate step in the negative response. With this per-step scheme, we are able to attain consistent gains over only positive data, attaining performance similar to amplifying the amount of synthetic data by $\mathbf{8 \times}$. We show that training on per-step negatives can help to unlearn spurious correlations in the positive data, and is equivalent to advantage-weighted reinforcement learning (RL), implying that it inherits robustness benefits of RL over imitating positive data alone.
Jogging the Memory of Unlearned Model Through Targeted Relearning Attack
Hu, Shengyuan, Fu, Yiwei, Wu, Zhiwei Steven, Smith, Virginia
Machine unlearning is a promising approach to mitigate undesirable memorization of training data in ML models. However, in this work we show that existing approaches for unlearning in LLMs are surprisingly susceptible to a simple set of targeted relearning attacks. With access to only a small and potentially loosely related set of data, we find that we can 'jog' the memory of unlearned models to reverse the effects of unlearning. We formalize this unlearning-relearning pipeline, explore the attack across three popular unlearning benchmarks, and discuss future directions and guidelines that result from our study.
Guardrail Baselines for Unlearning in LLMs
Thaker, Pratiksha, Maurya, Yash, Hu, Shengyuan, Wu, Zhiwei Steven, Smith, Virginia
Recent years have seen two trends emerge simultaneously: large language models (LLMs) trained on increasing amounts of user data (generally scraped indiscriminately from the web), in parallel with increasing legal protections on digital data use including data revocation ("right to be forgotten") laws. In order to support data revocation for models that have already been trained on potentially sensitive data, a number of works have proposed approaches for data "unlearning" (Bourtoule et al., 2021; Gupta et al., 2021; Ginart et al., 2019), which aims to remove the influence of specific subsets of training data without entirely retraining a model. Unlearning in LLMs is particularly challenging because individuals' information may not be contained to specific data points (Brown et al., 2022; Tramèr et al., 2022). Nevertheless, recent work has shown that model finetuning is a promising approach to forget, for example, information corresponding to the book series Harry Potter (Eldan and Russinovich, 2023); information about specific individuals in a synthetic dataset (Maini et al., 2024); or knowledge that could give information to malicious agents Li et al. (2024). While finetuning is a promising approach, a number of recent works have shown that simple modifications to the input prompt or output postprocessing filters (which we collectively call "guardrails") can also be effective for generating a desirable output distribution from a model (Pawelczyk et al., 2023; Brown et al., 2020; Chowdhery et al., 2023; Wei et al., 2021; Kim et al., 2024). Prompt prefixes and postprocessing filters do not update the model weights, so the resulting model itself would not satisfy definitions of unlearning that require the distribution of model weights to match a model retrained from scratch Bourtoule et al. (2021). However, in practical settings where users can only access the model through an API, modifying the output distribution alone can suffice. In fact, most existing unlearning benchmarks (Eldan and Russinovich, 2023; Maini et al., 2024; unl, 2023; Li et al., 2024) only examine the model outputs when evaluating unlearning, which is consistent with a threat model in which users have only API access (see Section 3). In this paper, we investigate how existing benchmarks fare under guardrail-based approaches, and show that in three popular unlearning benchmarks, guardrails not only give strong performance comparable to finetuning baselines, but can also surface weaknesses or inconsistencies in the benchmarks or metrics themselves.
Federated LoRA with Sparse Communication
Kuo, Kevin, Raje, Arian, Rajesh, Kousik, Smith, Virginia
Low-rank adaptation (LoRA) is a natural method for finetuning in communication-constrained machine learning settings such as cross-device federated learning. Prior work that has studied LoRA in the context of federated learning has focused on improving LoRA's robustness to heterogeneity and privacy. In this work, we instead consider techniques for further improving communication-efficiency in federated LoRA. Unfortunately, we show that centralized ML methods that improve the efficiency of LoRA through unstructured pruning do not transfer well to federated settings. We instead study a simple approach, \textbf{FLASC}, that applies sparsity to LoRA during communication while allowing clients to locally fine-tune the entire LoRA module. Across four common federated learning tasks, we demonstrate that this method matches the performance of dense LoRA with up to $10\times$ less communication. Additionally, despite being designed primarily to target communication, we find that this approach has benefits in terms of heterogeneity and privacy relative to existing approaches tailored to these specific concerns. Overall, our work highlights the importance of considering system-specific constraints when developing communication-efficient finetuning approaches, and serves as a simple and competitive baseline for future work in federated finetuning.
No Free Lunch in LLM Watermarking: Trade-offs in Watermarking Design Choices
Pang, Qi, Hu, Shengyuan, Zheng, Wenting, Smith, Virginia
Advances in generative models have made it possible for AI-generated text, code, and images to mirror human-generated content in many applications. Watermarking, a technique that aims to embed information in the output of a model to verify its source, is useful for mitigating the misuse of such AI-generated content. However, we show that common design choices in LLM watermarking schemes make the resulting systems surprisingly susceptible to attack -- leading to fundamental trade-offs in robustness, utility, and usability. To navigate these trade-offs, we rigorously study a set of simple yet effective attacks on common watermarking systems, and propose guidelines and defenses for LLM watermarking in practice.