Goto

Collaborating Authors

 Lécuyer, Mathias


Training and Evaluating Causal Forecasting Models for Time-Series

arXiv.org Artificial Intelligence

Deep learning time-series models are often used to make forecasts that inform downstream decisions. Since these decisions can differ from those in the training set, there is an implicit requirement that time-series models will generalize outside of their training distribution. Despite this core requirement, time-series models are typically trained and evaluated on in-distribution predictive tasks. We extend the orthogonal statistical learning framework to train causal time-series models that generalize better when forecasting the effect of actions outside of their training distribution. To evaluate these models, we leverage Regression Discontinuity Designs popular in economics to construct a test set of causal treatment effects.


Efficient and Adaptive Posterior Sampling Algorithms for Bandits

arXiv.org Machine Learning

We study Thompson Sampling-based algorithms for stochastic bandits with bounded rewards. As the existing problem-dependent regret bound for Thompson Sampling with Gaussian priors [Agrawal and Goyal, 2017] is vacuous when $T \le 288 e^{64}$, we derive a more practical bound that tightens the coefficient of the leading term %from $288 e^{64}$ to $1270$. Additionally, motivated by large-scale real-world applications that require scalability, adaptive computational resource allocation, and a balance in utility and computation, we propose two parameterized Thompson Sampling-based algorithms: Thompson Sampling with Model Aggregation (TS-MA-$\alpha$) and Thompson Sampling with Timestamp Duelling (TS-TD-$\alpha$), where $\alpha \in [0,1]$ controls the trade-off between utility and computation. Both algorithms achieve $O \left(K\ln^{\alpha+1}(T)/\Delta \right)$ regret bound, where $K$ is the number of arms, $T$ is the finite learning horizon, and $\Delta$ denotes the single round performance loss when pulling a sub-optimal arm.


PANORAMIA: Privacy Auditing of Machine Learning Models without Retraining

arXiv.org Artificial Intelligence

We introduce a privacy auditing scheme for ML models that relies on membership inference attacks using generated data as "non-members". This scheme, which we call PANORAMIA, quantifies the privacy leakage for large-scale ML models without control of the training process or model re-training and only requires access to a subset of the training data. To demonstrate its applicability, we evaluate our auditing scheme across multiple ML domains, ranging from image and tabular data classification to large-scale language models.


DP-AdamBC: Your DP-Adam Is Actually DP-SGD (Unless You Apply Bias Correction)

arXiv.org Artificial Intelligence

The Adam optimizer is a popular choice in contemporary deep learning, due to its strong empirical performance. However we observe that in privacy sensitive scenarios, the traditional use of Differential Privacy (DP) with the Adam optimizer leads to sub-optimal performance on several tasks. We find that this performance degradation is due to a DP bias in Adam's second moment estimator, introduced by the addition of independent noise in the gradient computation to enforce DP guarantees. This DP bias leads to a different scaling for low variance parameter updates, that is inconsistent with the behavior of non-private Adam. We propose DP-AdamBC, an optimization algorithm which removes the bias in the second moment estimation and retrieves the expected behaviour of Adam. Empirically, DP-AdamBC significantly improves the optimization performance of DP-Adam by up to 3.5% in final accuracy in image, text, and graph node classification tasks.


DP-Adam: Correcting DP Bias in Adam's Second Moment Estimation

arXiv.org Artificial Intelligence

We observe that the traditional use of DP with the Adam optimizer introduces a bias in the second moment estimation, due to the addition of independent noise in the gradient computation. This bias leads to a different scaling for low variance parameter updates, that is inconsistent with the behavior of non-private Adam, and Adam's sign descent interpretation. Empirically, correcting the bias introduced by DP noise significantly improves the optimization performance of DP-Adam.


Packing Privacy Budget Efficiently

arXiv.org Artificial Intelligence

Machine learning (ML) models can leak information about users, and differential privacy (DP) provides a rigorous way to bound that leakage under a given budget. This DP budget can be regarded as a new type of compute resource in workloads of multiple ML models training on user data. Once it is used, the DP budget is forever consumed. Therefore, it is crucial to allocate it most efficiently to train as many models as possible. This paper presents the scheduler for privacy that optimizes for efficiency. We formulate privacy scheduling as a new type of multidimensional knapsack problem, called privacy knapsack, which maximizes DP budget efficiency. We show that privacy knapsack is NP-hard, hence practical algorithms are necessarily approximate. We develop an approximation algorithm for privacy knapsack, DPK, and evaluate it on microbenchmarks and on a new, synthetic private-ML workload we developed from the Alibaba ML cluster trace. We show that DPK: (1) often approaches the efficiency-optimal schedule, (2) consistently schedules more tasks compared to a state-of-the-art privacy scheduling algorithm that focused on fairness (1.3-1.7x in Alibaba, 1.0-2.6x in microbenchmarks), but (3) sacrifices some level of fairness for efficiency. Therefore, using DPK, DP ML operators should be able to train more models on the same amount of user data while offering the same privacy guarantee to their users.


GlueFL: Reconciling Client Sampling and Model Masking for Bandwidth Efficient Federated Learning

arXiv.org Artificial Intelligence

Federated learning (FL) is an effective technique to directly involve edge devices in machine learning training while preserving client privacy. However, the substantial communication overhead of FL makes training challenging when edge devices have limited network bandwidth. Existing work to optimize FL bandwidth overlooks downstream transmission and does not account for FL client sampling. In this paper we propose GlueFL, a framework that incorporates new client sampling and model compression algorithms to mitigate low download bandwidths of FL clients. GlueFL prioritizes recently used clients and bounds the number of changed positions in compression masks in each round. Across three popular FL datasets and three state-of-the-art strategies, GlueFL reduces downstream client bandwidth by 27% on average and reduces training time by 29% on average. One important strategy is client sampling, which limits the number of clients that perform training in Federated learning (FL) moves machine learning (ML) training each round (McMahan et al., 2017; Luo et al., 2022). In FL, edge clients communicate with a sampling reduces both upstream and downstream bandwidth.


Sayer: Using Implicit Feedback to Optimize System Policies

arXiv.org Machine Learning

We observe that many system policies that make threshold decisions involving a resource (e.g., time, memory, cores) naturally reveal additional, or implicit feedback. For example, if a system waits X min for an event to occur, then it automatically learns what would have happened if it waited


Practical Privacy Filters and Odometers with R\'enyi Differential Privacy and Applications to Differentially Private Deep Learning

arXiv.org Machine Learning

Differential Privacy (DP) is the leading approach to privacy preserving deep learning. As such, there are multiple efforts to provide drop-in integration of DP into popular frameworks. These efforts, which add noise to each gradient computation to make it DP, rely on composition theorems to bound the total privacy loss incurred over this sequence of DP computations. However, existing composition theorems present a tension between efficiency and flexibility. Most theorems require all computations in the sequence to have a predefined DP parameter, called the privacy budget. This prevents the design of training algorithms that adapt the privacy budget on the fly, or that terminate early to reduce the total privacy loss. Alternatively, the few existing composition results for adaptive privacy budgets provide complex bounds on the privacy loss, with constants too large to be practical. In this paper, we study DP composition under adaptive privacy budgets through the lens of R\'enyi Differential Privacy, proving a simpler composition theorem with smaller constants, making it practical enough to use in algorithm design. We demonstrate two applications of this theorem for DP deep learning: adapting the noise or batch size online to improve a model's accuracy within a fixed total privacy loss, and stopping early when fine-tuning a model to reduce total privacy loss.