blanchet
Reviews: Distributionally Robust Optimization and Generalization in Kernel Methods
I raised my score from 4 to 6 after reading the author's feedback, mainly due to the novelty of the framework. However, I would expect the author can provide a thorough discussion of the limitation of the result in the camera-ready version. Weakness: Due to the intractbility of the MMD DRO problem, the submission did not find an exact reformulation as much other literature in DRO did for other probability metrics. Instead, the author provides several layers of approximation. The reason why I emphasize the importance of a tight bound, if not an exact reformulation, is that one of the major criticism about (distributionally) robust optimization is that it is sometimes too conservative, and thus a loose upper bound might not be sufficient to mitigate the over-conservativeness and demonstrate the power of distributionally robust optimization. When a new distance is introduced into the DRO framework, a natural question is why it should be used compared with other existing approaches.
Generative Learning for Simulation of Vehicle Faults
Kuiper, Patrick, Lin, Sirui, Blanchet, Jose, Tarokh, Vahid
We focus this analysis on the United States' Department of Defense (DoD), where the US Army alone is projected to spend an estimated $5 billion per year (in 2020 dollar terms through 2050), developing and acquiring ground vehicles, where ground vehicles are any vehicles other than aircraft and ships (CBO 2021). Maintaining this enormous investment is critical to ensuring combat readiness across the DoD, where the department spent $90 billion in 2022 on maintaining vehicles across domains: ground, air, and sea (GAO 2022). Predicting requirements is critical to an effective maintenance program. The application of statistics towards vehicle maintenance prediction is often referred to as predictive maintenance. Recognizing the importance of predictive maintenance, in the 2022 National Defense Authorization Act (NDAA) Congress required the DoD Inspector General Office to review predictive maintenance practices, originally established by DoD directives in 2002 and 2007 (DoDIG 2023).
A Finite Sample Complexity Bound for Distributionally Robust Q-learning
Wang, Shengbo, Si, Nian, Blanchet, Jose, Zhou, Zhengyuan
We consider a reinforcement learning setting in which the deployment environment is different from the training environment. Applying a robust Markov decision processes formulation, we extend the distributionally robust $Q$-learning framework studied in Liu et al. [2022]. Further, we improve the design and analysis of their multi-level Monte Carlo estimator. Assuming access to a simulator, we prove that the worst-case expected sample complexity of our algorithm to learn the optimal robust $Q$-function within an $\epsilon$ error in the sup norm is upper bounded by $\tilde O(|S||A|(1-\gamma)^{-5}\epsilon^{-2}p_{\wedge}^{-6}\delta^{-4})$, where $\gamma$ is the discount rate, $p_{\wedge}$ is the non-zero minimal support probability of the transition kernels and $\delta$ is the uncertainty size. This is the first sample complexity result for the model-free robust RL problem. Simulation studies further validate our theoretical results.
A Class of Geometric Structures in Transfer Learning: Minimax Bounds and Optimality
Zhang, Xuhui, Blanchet, Jose, Ghosh, Soumyadip, Squillante, Mark S.
We study the problem of transfer learning, observing that previous efforts to understand its information-theoretic limits do not fully exploit the geometric structure of the source and target domains. In contrast, our study first illustrates the benefits of incorporating a natural geometric structure within a linear regression model, which corresponds to the generalized eigenvalue problem formed by the Gram matrices of both domains. We next establish a finite-sample minimax lower bound, propose a refined model interpolation estimator that enjoys a matching upper bound, and then extend our framework to multiple source domains and generalized linear models. Surprisingly, as long as information is available on the distance between the source and target parameters, negative-transfer does not occur. Simulation studies show that our proposed interpolation estimator outperforms state-of-the-art transfer learning methods in both moderate- and high-dimensional settings.
Statistical Analysis of Wasserstein Distributionally Robust Estimators
Blanchet, Jose, Murthy, Karthyek, Nguyen, Viet Anh
We consider statistical methods which invoke a min-max distributionally robust formulation to extract good out-of-sample performance in data-driven optimization and learning problems. Acknowledging the distributional uncertainty in learning from limited samples, the min-max formulations introduce an adversarial inner player to explore unseen covariate data. The resulting Distributionally Robust Optimization (DRO) formulations, which include Wasserstein DRO formulations (our main focus), are specified using optimal transportation phenomena. Upon describing how these infinite-dimensional min-max problems can be approached via a finite-dimensional dual reformulation, the tutorial moves into its main component, namely, explaining a generic recipe for optimally selecting the size of the adversary's budget. This is achieved by studying the limit behavior of an optimal transport projection formulation arising from an inquiry on the smallest confidence region that includes the unknown population risk minimizer. Incidentally, this systematic prescription coincides with those in specific examples in high-dimensional statistics and results in error bounds that are free from the curse of dimensions. Equipped with this prescription, we present a central limit theorem for the DRO estimator and provide a recipe for constructing compatible confidence regions that are useful for uncertainty quantification. The rest of the tutorial is devoted to insights into the nature of the optimizers selected by the min-max formulations and additional applications of optimal transport projections.
A Statistical Test for Probabilistic Fairness
Taskesen, Bahar, Blanchet, Jose, Kuhn, Daniel, Nguyen, Viet Anh
Algorithms are now routinely used to make consequential decisions that affect human lives. Examples include college admissions, medical interventions or law enforcement. While algorithms empower us to harness all information hidden in vast amounts of data, they may inadvertently amplify existing biases in the available datasets. This concern has sparked increasing interest in fair machine learning, which aims to quantify and mitigate algorithmic discrimination. Indeed, machine learning models should undergo intensive tests to detect algorithmic biases before being deployed at scale. In this paper, we use ideas from the theory of optimal transport to propose a statistical hypothesis test for detecting unfair classifiers. Leveraging the geometry of the feature space, the test statistic quantifies the distance of the empirical distribution supported on the test samples to the manifold of distributions that render a pre-trained classifier fair. We develop a rigorous hypothesis testing mechanism for assessing the probabilistic fairness of any pre-trained logistic classifier, and we show both theoretically as well as empirically that the proposed test is asymptotically correct. In addition, the proposed framework offers interpretability by identifying the most favorable perturbation of the data so that the given classifier becomes fair.
Confidence Regions in Wasserstein Distributionally Robust Estimation
Blanchet, Jose, Murthy, Karthyek, Si, Nian
Wasserstein distributionally robust optimization (DRO) estimators are obtained as solutions of min-max problems in which the statistician selects a parameter minimizing the worst-case loss among all probability models within a certain distance (in a Wasserstein sense) from the underlying empirical measure. While motivated by the need to identify model parameters (or) decision choices that are robust to model uncertainties and misspecification, the Wasserstein DRO estimators recover a wide range of regularized estimators, including square-root LASSO and support vector machines, among others, as particular cases. This paper studies the asymptotic normality of underlying DRO estimators as well as the properties of an optimal (in a suitable sense) confidence region induced by the Wasserstein DRO formulation.
A Distributionally Robust Boosting Algorithm
Blanchet, Jose, Kang, Yang, Zhang, Fan, Hu, Zhangyi
Distributionally Robust Optimization (DRO) has been shown to provide a flexible framework for decision making under uncertainty and statistical estimation. For example, recent works in DRO have shown that popular statistical estimators can be interpreted as the solutions of suitable formulated data-driven DRO problems. In turn, this connection is used to optimally select tuning parameters in terms of a principled approach informed by robustness considerations. This paper contributes to this growing literature, connecting DRO and statistics, by showing how boosting algorithms can be studied via DRO. We propose a boosting type algorithm, named DRO-Boosting, as a procedure to solve our DRO formulation. Our DRO-Boosting algorithm recovers Adaptive Boosting (AdaBoost) in particular, thus showing that AdaBoost is effectively solving a DRO problem. We apply our algorithm to a financial dataset on credit card default payment prediction. We find that our approach compares favorably to alternative boosting methods which are widely used in practice.
Data-driven Optimal Transport Cost Selection for Distributionally Robust Optimizatio
Blanchet, Jose, Kang, Yang, Zhang, Fan, Murthy, Karthyek
Recently, (Blanchet, Kang, and Murhy 2016) showed that several machine learning algorithms, such as square-root Lasso, Support Vector Machines, and regularized logistic regression, among many others, can be represented exactly as distributionally robust optimization (DRO) problems. The distributional uncertainty is defined as a neighborhood centered at the empirical distribution. We propose a methodology which learns such neighborhood in a natural data-driven way. We show rigorously that our framework encompasses adaptive regularization as a particular case. Moreover, we demonstrate empirically that our proposed methodology is able to improve upon a wide range of popular machine learning estimators.