AITopics | Vassilvitskii, Sergei

Collaborating Authors

Vassilvitskii, Sergei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

An Optimization Framework for Differentially Private Sparse Fine-Tuning

Makni, Mehdi, Behdin, Kayhan, Afriat, Gabriel, Xu, Zheng, Vassilvitskii, Sergei, Ponomareva, Natalia, Hazimeh, Hussein, Mazumder, Rahul

arXiv.org Machine LearningMar-17-2025

Differentially private stochastic gradient descent (DP-SGD) is broadly considered to be the gold standard for training and fine-tuning neural networks under differential privacy (DP). With the increasing availability of high-quality pre-trained model checkpoints (e.g., vision and language models), fine-tuning has become a popular strategy. However, despite recent progress in understanding and applying DP-SGD for private transfer learning tasks, significant challenges remain -- most notably, the performance gap between models fine-tuned with DP-SGD and their non-private counterparts. Sparse fine-tuning on private data has emerged as an alternative to full-model fine-tuning; recent work has shown that privately fine-tuning only a small subset of model weights and keeping the rest of the weights fixed can lead to better performance. In this work, we propose a new approach for sparse fine-tuning of neural networks under DP. Existing work on private sparse finetuning often used fixed choice of trainable weights (e.g., updating only the last layer), or relied on public model's weights to choose the subset of weights to modify. Such choice of weights remains suboptimal. In contrast, we explore an optimization-based approach, where our selection method makes use of the private gradient information, while using off the shelf privacy accounting techniques. Our numerical experiments on several computer vision models and datasets show that our selection method leads to better prediction accuracy, compared to full-model private fine-tuning or existing private sparse fine-tuning approaches.

artificial intelligence, fine-tuning, machine learning, (18 more...)

arXiv.org Machine Learning

2503.12822

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Add feedback

Escaping Collapse: The Strength of Weak Data for Large Language Model Training

Amin, Kareem, Babakniya, Sara, Bie, Alex, Kong, Weiwei, Syed, Umar, Vassilvitskii, Sergei

arXiv.org Artificial IntelligenceFeb-12-2025

Synthetically-generated data plays an increasingly larger role in training large language models. However, while synthetic data has been found to be useful, studies have also shown that without proper curation it can cause LLM performance to plateau, or even "collapse", after many training iterations. In this paper, we formalize this question and develop a theoretical framework to investigate how much curation is needed in order to ensure that LLM performance continually improves. We find that the requirements are nearly minimal. We describe a training procedure that converges to an optimal LLM even if almost all of the non-synthetic training data is of poor quality. Our analysis is inspired by boosting, a classic machine learning technique that leverages a very weak learning algorithm to produce an arbitrarily good classifier. Our training procedure subsumes many recently proposed methods for training LLMs on synthetic data, and thus our analysis sheds light on why they are successful, and also suggests opportunities for future improvement. We present experiments that validate our theory, and show that dynamically focusing labeling resources on the most challenging examples -- in much the same way that boosting focuses the efforts of the weak learner -- leads to improved performance.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.08924

Country:

Asia (0.67)
North America > United States > California (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Binary Search with Distributional Predictions

Dinitz, Michael, Im, Sungjin, Lavastida, Thomas, Moseley, Benjamin, Niaparast, Aidin, Vassilvitskii, Sergei

arXiv.org Artificial IntelligenceNov-24-2024

Algorithms with (machine-learned) predictions is a powerful framework for combining traditional worst-case algorithms with modern machine learning. However, the vast majority of work in this space assumes that the prediction itself is non-probabilistic, even if it is generated by some stochastic process (such as a machine learning system). This is a poor fit for modern ML, particularly modern neural networks, which naturally generate a distribution. We initiate the study of algorithms with distributional predictions, where the prediction itself is a distribution. We focus on one of the simplest yet fundamental settings: binary search (or searching a sorted array). This setting has one of the simplest algorithms with a point prediction, but what happens if the prediction is a distribution? We show that this is a richer setting: there are simple distributions where using the classical prediction-based algorithm with any single prediction does poorly. Motivated by this, as our main result, we give an algorithm with query complexity $O(H(p) + \log \eta)$, where $H(p)$ is the entropy of the true distribution $p$ and $\eta$ is the earth mover's distance between $p$ and the predicted distribution $\hat p$. This also yields the first distributionally-robust algorithm for the classical problem of computing an optimal binary search tree given a distribution over target keys. We complement this with a lower bound showing that this query complexity is essentially optimal (up to constants), and experiments validating the practical usefulness of our algorithm.

information retrieval, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.1603

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.89)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)

Add feedback

Private prediction for large-scale synthetic text generation

Amin, Kareem, Bie, Alex, Kong, Weiwei, Kurakin, Alexey, Ponomareva, Natalia, Syed, Umar, Terzis, Andreas, Vassilvitskii, Sergei

arXiv.org Artificial IntelligenceJul-16-2024

We present an approach for generating differentially private synthetic text using large language models (LLMs), via private prediction. In the private prediction framework, we only require the output synthetic data to satisfy differential privacy guarantees. This is in contrast to approaches that train a generative model on potentially sensitive user-supplied source data and seek to ensure the model itself is safe to release. We prompt a pretrained LLM with source data, but ensure that next-token predictions are made with differential privacy guarantees. Previous work in this paradigm reported generating a small number of examples (<10) at reasonable privacy levels, an amount of data that is useful only for downstream in-context learning or prompting. In contrast, we make changes that allow us to generate thousands of high-quality synthetic data points, greatly expanding the set of potential applications. Our improvements come from an improved privacy analysis and a better private selection mechanism, which makes use of the equivalence between the softmax layer for sampling tokens in LLMs and the exponential mechanism. Furthermore, we introduce a novel use of public predictions via the sparse vector technique, in which we do not pay privacy costs for tokens that are predictable without sensitive data; we find this to be particularly effective for structured data.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2407.12108

Country: North America > United States > Oregon (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Warm-starting Push-Relabel

Davies, Sami, Vassilvitskii, Sergei, Wang, Yuyan

arXiv.org Artificial IntelligenceMay-28-2024

Push-Relabel is one of the most celebrated network flow algorithms. Maintaining a pre-flow that saturates a cut, it enjoys better theoretical and empirical running time than other flow algorithms, such as Ford-Fulkerson. In practice, Push-Relabel is even faster than what theoretical guarantees can promise, in part because of the use of good heuristics for seeding and updating the iterative algorithm. However, it remains unclear how to run Push-Relabel on an arbitrary initialization that is not necessarily a pre-flow or cut-saturating. We provide the first theoretical guarantees for warm-starting Push-Relabel with a predicted flow, where our learning-augmented version benefits from fast running time when the predicted flow is close to an optimal flow, while maintaining robust worst-case guarantees. Interestingly, our algorithm uses the gap relabeling heuristic, which has long been employed in practice, even though prior to our work there was no rigorous theoretical justification for why it can lead to run-time improvements. We then provide experiments that show our warm-started Push-Relabel also works well in practice.

algorithm, artificial intelligence, push-relabel, (17 more...)

arXiv.org Artificial Intelligence

2405.18568

Country: Europe > Greece (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)

Add feedback

Scaling Laws for Downstream Task Performance of Large Language Models

Isik, Berivan, Ponomareva, Natalia, Hazimeh, Hussein, Paparas, Dimitris, Vassilvitskii, Sergei, Koyejo, Sanmi

arXiv.org Artificial IntelligenceFeb-6-2024

Scaling laws provide important insights that can guide the design of large language models (LLMs). Existing work has primarily focused on studying scaling laws for pretraining (upstream) loss. However, in transfer learning settings, in which LLMs are pretrained on an unsupervised dataset and then finetuned on a downstream task, we often also care about the downstream performance. In this work, we study the scaling behavior in a transfer learning setting, where LLMs are finetuned for machine translation tasks. Specifically, we investigate how the choice of the pretraining data and its size affect downstream performance (translation quality) as judged by two metrics: downstream cross-entropy and BLEU score. Our experiments indicate that the size of the finetuning dataset and the distribution alignment between the pretraining and downstream data significantly influence the scaling behavior. With sufficient alignment, both downstream cross-entropy and BLEU score improve monotonically with more pretraining data. In such cases, we show that it is possible to predict the downstream BLEU score with good accuracy using a log-law. However, there are also cases where moderate misalignment causes the BLEU score to fluctuate or get worse with more pretraining, whereas downstream cross-entropy monotonically improves. By analyzing these observations, we provide new practical insights for choosing appropriate pretraining data.

bleu score, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.04177

Country:

Europe > Germany (0.14)
Europe > Denmark (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.88)

Add feedback

Measuring Re-identification Risk

Carey, CJ, Dick, Travis, Epasto, Alessandro, Javanmard, Adel, Karlin, Josh, Kumar, Shankar, Medina, Andres Munoz, Mirrokni, Vahab, Nunes, Gabriel Henrique, Vassilvitskii, Sergei, Zhong, Peilin

arXiv.org Artificial IntelligenceJul-31-2023

In this work, we present a new theoretical framework to measure re-identification risk in such user representations. Our framework, based on hypothesis testing, formally bounds the probability that an attacker may be able to obtain the identity of a user from their representation. As an application, we show how our framework is general enough to model important real-world applications such as the Chrome's Topics API for interest-based advertising. We complement our theoretical bounds by showing provably good attack algorithms for re-identification that we use to estimate the re-identification risk in the Topics API. We believe this work provides a rigorous and interpretable notion of re-identification risk and a framework to measure it that can be used to inform real-world applications.

data mining, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2304.0721

Country:

Asia (0.28)
North America > United States > California (0.14)
Africa > Middle East > Egypt (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

How to DP-fy ML: A Practical Guide to Machine Learning with Differential Privacy

Ponomareva, Natalia, Hazimeh, Hussein, Kurakin, Alex, Xu, Zheng, Denison, Carson, McMahan, H. Brendan, Vassilvitskii, Sergei, Chien, Steve, Thakurta, Abhradeep

arXiv.org Artificial IntelligenceJul-31-2023

ML models are ubiquitous in real world applications and are a constant focus of research. At the same time, the community has started to realize the importance of protecting the privacy of ML training data. Differential Privacy (DP) has become a gold standard for making formal statements about data anonymization. However, while some adoption of DP has happened in industry, attempts to apply DP to real world complex ML models are still few and far between. The adoption of DP is hindered by limited practical guidance of what DP protection entails, what privacy guarantees to aim for, and the difficulty of achieving good privacy-utility-computation trade-offs for ML models. Tricks for tuning and maximizing performance are scattered among papers or stored in the heads of practitioners. Furthermore, the literature seems to present conflicting evidence on how and whether to apply architectural adjustments and which components are "safe" to use with DP. This work is a self-contained guide that gives an in-depth overview of the field of DP ML and presents information about achieving the best possible DP ML model with rigorous privacy guarantees. Our target audience is both researchers and practitioners. Researchers interested in DP for ML will benefit from a clear overview of current advances and areas for improvement. We include theory-focused sections that highlight important topics such as privacy accounting and its assumptions, and convergence. For a practitioner, we provide a background in DP theory and a clear step-by-step guide for choosing an appropriate privacy definition and approach, implementing DP training, potentially updating the model architecture, and tuning hyperparameters. For both researchers and practitioners, consistently and fully reporting privacy guarantees is critical, and so we propose a set of specific best practices for stating guarantees.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.1.14649

2303.00654

Country: North America > United States > California (0.13)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)
(2 more...)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.92)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.92)
(4 more...)

Add feedback

How to DP-fy ML: A Practical Guide to Machine Learning with Differential Privacy

Ponomareva, Natalia (a:1:{s:5:"en_US";s:6:"Google";}) | Hazimeh, Hussein (Google) | Kurakin, Alex | Xu, Zheng | Denison, Carson | McMahan, H. Brendan | Vassilvitskii, Sergei | Chien, Steve | Thakurta, Abhradeep Guha

Journal of Artificial Intelligence ResearchJul-23-2023

Machine Learning (ML) models are ubiquitous in real-world applications and are a constant focus of research. Modern ML models have become more complex, deeper, and harder to reason about. At the same time, the community has started to realize the importance of protecting the privacy of the training data that goes into these models. Differential Privacy (DP) has become a gold standard for making formal statements about data anonymization. However, while some adoption of DP has happened in industry, attempts to apply DP to real world complex ML models are still few and far between. The adoption of DP is hindered by limited practical guidance of what DP protection entails, what privacy guarantees to aim for, and the difficulty of achieving good privacy-utility-computation trade-offs for ML models. Tricks for tuning and maximizing performance are scattered among papers or stored in the heads of practitioners, particularly with respect to the challenging task of hyperparameter tuning. Furthermore, the literature seems to present conflicting evidence on how and whether to apply architectural adjustments and which components are “safe” to use with DP. In this survey paper, we attempt to create a self-contained guide that gives an in-depth overview of the field of DP ML. We aim to assemble information about achieving the best possible DP ML model with rigorous privacy guarantees. Our target audience is both researchers and practitioners. Researchers interested in DP for ML will benefit from a clear overview of current advances and areas for improvement. We also include theory-focused sections that highlight important topics such as privacy accounting and convergence. For a practitioner, this survey provides a background in DP theory and a clear step-by-step guide for choosing an appropriate privacy definition and approach, implementing DP training, potentially updating the model architecture, and tuning hyperparameters. For both researchers and practitioners, consistently and fully reporting privacy guarantees is critical, so we propose a set of specific best practices for stating guarantees. With sufficient computation and a sufficiently large training set or supplemental nonprivate data, both good accuracy (that is, almost as good as a non-private model) and good privacy can often be achievable. And even when computation and dataset size are limited, there are advantages to training with even a weak (but still finite) formal DP guarantee. Hence, we hope this work will facilitate more widespread deployments of DP ML models.

machine learning, mechanism, natural language, (25 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.14649

AI Access Foundation

14649

Journal of Artificial Intelligence Research

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.92)
(4 more...)

Add feedback

Learning-Augmented Private Algorithms for Multiple Quantile Release

Khodak, Mikhail, Amin, Kareem, Dick, Travis, Vassilvitskii, Sergei

arXiv.org Artificial IntelligenceMay-8-2023

When applying differential privacy to sensitive data, we can often improve performance using external information such as other sensitive data, public data, or human priors. We propose to use the learning-augmented algorithms (or algorithms with predictions) framework -- previously applied largely to improve time complexity or competitive ratios -- as a powerful way of designing and analyzing privacy-preserving methods that can take advantage of such external information to improve utility. This idea is instantiated on the important task of multiple quantile release, for which we derive error guarantees that scale with a natural measure of prediction quality while (almost) recovering state-of-the-art prediction-independent guarantees. Our analysis enjoys several advantages, including minimal assumptions about the data, a natural way of adding robustness, and the provision of useful surrogate losses for two novel ``meta" algorithms that learn predictions from other (potentially sensitive) data. We conclude with experiments on challenging tasks demonstrating that learning predictions across one or more instances can lead to large error reductions while preserving privacy.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2210.11222

Country: North America > United States (0.92)

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Information Technology > Data Science > Data Mining (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback