Fallah, Alireza
Enhancing Feature-Specific Data Protection via Bayesian Coordinate Differential Privacy
Aliakbarpour, Maryam, Chaudhuri, Syomantak, Courtade, Thomas A., Fallah, Alireza, Jordan, Michael I.
Local Differential Privacy (LDP) offers strong privacy guarantees without requiring users to trust external parties. However, LDP applies uniform protection to all data features, including less sensitive ones, which degrades performance of downstream tasks. To overcome this limitation, we propose a Bayesian framework, Bayesian Coordinate Differential Privacy (BCDP), that enables feature-specific privacy quantification. This more nuanced approach complements LDP by adjusting privacy protection according to the sensitivity of each feature, enabling improved performance of downstream tasks without compromising privacy. We characterize the properties of BCDP and articulate its connections with standard non-Bayesian privacy frameworks. We further apply our BCDP framework to the problems of private mean estimation and ordinary least-squares regression. The BCDP-based approach obtains improved accuracy compared to a purely LDP-based approach, without compromising on privacy.
Fair Allocation in Dynamic Mechanism Design
Fallah, Alireza, Jordan, Michael I., Ulichney, Annie
We consider a dynamic mechanism design problem where an auctioneer sells an indivisible good to two groups of buyers in every round, for a total of $T$ rounds. The auctioneer aims to maximize their discounted overall revenue while adhering to a fairness constraint that guarantees a minimum average allocation for each group. We begin by studying the static case ($T=1$) and establish that the optimal mechanism involves two types of subsidization: one that increases the overall probability of allocation to all buyers, and another that favors the group which otherwise has a lower probability of winning the item. We then extend our results to the dynamic case by characterizing a set of recursive functions that determine the optimal allocation and payments in each round. Notably, our results establish that in the dynamic case, the seller, on the one hand, commits to a participation reward to incentivize truth-telling, and on the other hand, charges an entry fee for every round. Moreover, the optimal allocation once more involves subsidization in favor of one group, where the extent of subsidization depends on the difference in future utilities for both the seller and buyers when allocating the item to one group versus the other. Finally, we present an approximation scheme to solve the recursive equations and determine an approximately optimal and fair allocation efficiently.
Optimal and Differentially Private Data Acquisition: Central and Local Mechanisms
Fallah, Alireza, Makhdoumi, Ali, Malekian, Azarakhsh, Ozdaglar, Asuman
We consider a platform's problem of collecting data from privacy sensitive users to estimate an underlying parameter of interest. We formulate this question as a Bayesian-optimal mechanism design problem, in which an individual can share her (verifiable) data in exchange for a monetary reward or services, but at the same time has a (private) heterogeneous privacy cost which we quantify using differential privacy. We consider two popular differential privacy settings for providing privacy guarantees for the users: central and local. In both settings, we establish minimax lower bounds for the estimation error and derive (near) optimal estimators for given heterogeneous privacy loss levels for users. Building on this characterization, we pose the mechanism design problem as the optimal selection of an estimator and payments that will elicit truthful reporting of users' privacy sensitivities. Under a regularity condition on the distribution of privacy sensitivities we develop efficient algorithmic mechanisms to solve this problem in both privacy settings. Our mechanism in the central setting can be implemented in time $\mathcal{O}(n \log n)$ where $n$ is the number of users and our mechanism in the local setting admits a Polynomial Time Approximation Scheme (PTAS).
Private Adaptive Gradient Methods for Convex Optimization
Asi, Hilal, Duchi, John, Fallah, Alireza, Javidbakht, Omid, Talwar, Kunal
While the success of stochastic gradient methods for solving empirical risk minimization has motivated their adoption across much of machine learning, increasing privacy risks in data-intensive tasks have made applying them more challenging [DMNS06]: gradients can leak users' data, intermediate models can compromise individuals, and even final trained models may be non-private without substantial care. This motivates a growing line of work developing private variants of stochastic gradient descent (SGD), where algorithms guarantee differential privacy by perturbing individual gradients with random noise [DJW13; ST13b; ACGMMTZ16; DJW18; BFTT19; FKT20]. Yet these noise addition procedures typically fail to reflect the geometry underlying the optimization problem, which in non-private cases is essential: for high-dimensional problems with sparse parameters, mirror descent and its variants [BT03; NJLS09] are essential, while in the large-scale stochastic settings prevalent in deep learning, AdaGrad and other adaptive variants [DHS11] provide stronger theoretical and practical performance. Even more, methods that do not adapt (or do not leverage geometry) can be provably sub-optimal, in that there exist problems where their convergence is much slower than adaptive variants that reflect appropriate geometry [LD19]. To address these challenges, we introduce Pagan (Private AdaGrad with Adaptive Noise), a new differentially private variant of stochastic gradient descent and AdaGrad.
A Wasserstein Minimax Framework for Mixed Linear Regression
Diamandis, Theo, Eldar, Yonina C., Fallah, Alireza, Farnia, Farzan, Ozdaglar, Asuman
Multi-modal distributions are commonly used to model clustered data in statistical learning tasks. In this paper, we consider the Mixed Linear Regression (MLR) problem. We propose an optimal transport-based framework for MLR problems, Wasserstein Mixed Linear Regression (WMLR), which minimizes the Wasserstein distance between the learned and target mixture regression models. Through a model-based duality analysis, WMLR reduces the underlying MLR task to a nonconvex-concave minimax optimization problem, which can be provably solved to find a minimax stationary point by the Gradient Descent Ascent (GDA) algorithm. In the special case of mixtures of two linear regression models, we show that WMLR enjoys global convergence and generalization guarantees. We prove that WMLR's sample complexity grows linearly with the dimension of data. Finally, we discuss the application of WMLR to the federated learning task where the training samples are collected by multiple agents in a network. Unlike the Expectation Maximization algorithm, WMLR directly extends to the distributed, federated learning setting. We support our theoretical results through several numerical experiments, which highlight our framework's ability to handle the federated learning setting with mixture models.
Generalization of Model-Agnostic Meta-Learning Algorithms: Recurring and Unseen Tasks
Fallah, Alireza, Mokhtari, Aryan, Ozdaglar, Asuman
In this paper, we study the generalization properties of Model-Agnostic Meta-Learning (MAML) algorithms for supervised learning problems. We focus on the setting in which we train the MAML model over $m$ tasks, each with $n$ data points, and characterize its generalization error from two points of view: First, we assume the new task at test time is one of the training tasks, and we show that, for strongly convex objective functions, the expected excess population loss is bounded by $\mathcal{O}(1/mn)$. Second, we consider the MAML algorithm's generalization to an unseen task and show that the resulting generalization error depends on the total variation distance between the underlying distributions of the new task and the tasks observed during the training process. Our proof techniques rely on the connections between algorithmic stability and generalization bounds of algorithms. In particular, we propose a new definition of stability for meta-learning algorithms, which allows us to capture the role of both the number of tasks $m$ and number of samples per task $n$ on the generalization error of MAML.
Provably Convergent Policy Gradient Methods for Model-Agnostic Meta-Reinforcement Learning
Fallah, Alireza, Mokhtari, Aryan, Ozdaglar, Asuman
We consider Model-Agnostic Meta-Learning (MAML) methods for Reinforcement Learning (RL) problems where the goal is to find a policy (using data from several tasks represented by Markov Decision Processes (MDPs)) that can be updated by one step of stochastic policy gradient for the realized MDP. In particular, using stochastic gradients in MAML update step is crucial for RL problems since computation of exact gradients requires access to a large number of possible trajectories. For this formulation, we propose a variant of the MAML method, named Stochastic Gradient Meta-Reinforcement Learning (SG-MRL), and study its convergence properties. We derive the iteration and sample complexity of SG-MRL to find an $\epsilon$-first-order stationary point, which, to the best of our knowledge, provides the first convergence guarantee for model-agnostic meta-reinforcement learning algorithms. We further show how our results extend to the case where more than one step of stochastic policy gradient method is used in the update during the test time.
On the Convergence Theory of Gradient-Based Model-Agnostic Meta-Learning Algorithms
Fallah, Alireza, Mokhtari, Aryan, Ozdaglar, Asuman
In this paper, we study the convergence of a class of gradient-based Model-Agnostic Meta-Learning (MAML) methods and characterize their overall computational complexity as well as their best achievable level of accuracy in terms of gradient norm for nonconvex loss functions. In particular, we start with the MAML algorithm and its first order approximation (FO-MAML) and highlight the challenges that emerge in their analysis. By overcoming these challenges not only we provide the first theoretical guarantees for MAML and FO-MAML in nonconvex settings, but also we answer some of the unanswered questions for the implementation of these algorithms including how to choose their learning rate (stepsize) and the batch size for both tasks and datasets corresponding to tasks. In particular, we show that MAML can find an $\epsilon$-first-order stationary point for any positive $\epsilon$ after at most $\mathcal{O}(1/\epsilon^2)$ iterations at the expense of requiring second-order information. We also show that the FO-MAML method which ignores the second-order information required in the update of MAML cannot achieve any small desired level of accuracy, i.e, FO-MAML cannot find an $\epsilon$-first-order stationary point for any positive $\epsilon$. We further propose a new variant of the MAML algorithm called Hessian-free MAML (HF-MAML) which preserves all theoretical guarantees of MAML, without requiring access to the second-order information of loss functions.
Robust Accelerated Gradient Methods for Smooth Strongly Convex Functions
Aybat, Necdet Serhat, Fallah, Alireza, Gurbuzbalaban, Mert, Ozdaglar, Asuman
We study the trade-offs between convergence rate and robustness to gradient errors in designing a first-order algorithm. We focus on gradient descent (GD) and accelerated gradient (AG) methods for minimizing strongly convex functions when the gradient has random errors in the form of additive white noise. With gradient errors, the function values of the iterates need not converge to the optimal value; hence, we define the robustness of an algorithm to noise as the asymptotic expected suboptimality of the iterate sequence to input noise power. For this robustness measure, we provide exact expressions for the quadratic case using tools from robust control theory and tight upper bounds for the smooth strongly convex case using Lyapunov functions certified through matrix inequalities. We use these characterizations within an optimization problem which selects parameters of each algorithm to achieve a particular trade-off between rate and robustness. Our results show that AG can achieve acceleration while being more robust to random gradient errors. This behavior is quite different than previously reported in the deterministic gradient noise setting. We also establish some connections between the robustness of an algorithm and how quickly it can converge back to the optimal solution if it is perturbed from the optimal point with deterministic noise. Our framework also leads to practical algorithms that can perform better than other state-of-the-art methods in the presence of random gradient noise.
A Universally Optimal Multistage Accelerated Stochastic Gradient Method
Aybat, Necdet Serhat, Fallah, Alireza, Gurbuzbalaban, Mert, Ozdaglar, Asuman
We study the problem of minimizing a strongly convex and smooth function when we have noisy estimates of its gradient. We propose a novel multistage accelerated algorithm that is universally optimal in the sense that it achieves the optimal rate both in the deterministic and stochastic case and operates without knowledge of noise characteristics. The algorithm consists of stages that use a stochastic version of Nesterov's accelerated algorithm with a specific restart and parameters selected to achieve the fastest reduction in the bias-variance terms in the convergence rate bounds.