saa
The Bias-Variance Tradeoff in Data-Driven Optimization: A Local Misspecification Perspective
Data-driven stochastic optimization is ubiquitous in machine learning and operational decision-making problems. Sample average approximation (SAA) and model-based approaches such as estimate-then-optimize (ETO) or integrated estimation-optimization (IEO) are all popular, with model-based approaches being able to circumvent some of the issues with SAA in complex context-dependent problems. Yet the relative performance of these methods is poorly understood, with most results confined to the dichotomous cases of the model-based approach being either well-specified or misspecified. We develop the first results that allow for a more granular analysis of the relative performance of these methods under a local misspecification setting, which models the scenario where the model-based approach is nearly well-specified. By leveraging tools from contiguity theory in statistics, we show that there is a bias-variance tradeoff between SAA, IEO, and ETO under local misspecification, and that the relative importance of the bias and the variance depends on the degree of local misspecification. Moreover, we derive explicit expressions for the decision bias, which allows us to characterize (un)impactful misspecification directions, and provide further geometric understanding of the variance.
Boosting Adversarial Transferability with Spatial Adversarial Alignment
Deep neural networks are vulnerable to adversarial examples that exhibit transferability across various models. Numerous approaches are proposed to enhance the transferability of adversarial examples, including advanced optimization, data augmentation, and model modifications. However, these methods still show limited transferability, partiovovocularly in cross-architecture scenarios, such as from CNN to ViT. To achieve high transferability, we propose a technique termed Spatial Adversarial Alignment (SAA), which employs an alignment loss and leverages a witness model to fine-tune the surrogate model. Specifically, SAA consists of two key parts: spatial-aware alignment and adversarial-aware alignment.
The Bias-Variance Tradeoff in Data-Driven Optimization: A Local Misspecification Perspective
Lan, Haixiang, Liao, Luofeng, Elmachtoub, Adam N., Kroer, Christian, Lam, Henry, Zhang, Haofeng
Data-driven stochastic optimization is ubiquitous in machine learning and operational decision-making problems. Sample average approximation (SAA) and model-based approaches such as estimate-then-optimize (ETO) or integrated estimation-optimization (IEO) are all popular, with model-based approaches being able to circumvent some of the issues with SAA in complex context-dependent problems. Yet the relative performance of these methods is poorly understood, with most results confined to the dichotomous cases of the model-based approach being either well-specified or misspecified. We develop the first results that allow for a more granular analysis of the relative performance of these methods under a local misspecification setting, which models the scenario where the model-based approach is nearly well-specified. By leveraging tools from contiguity theory in statistics, we show that there is a bias-variance tradeoff between SAA, IEO, and ETO under local misspecification, and that the relative importance of the bias and the variance depends on the degree of local misspecification. Moreover, we derive explicit expressions for the decision bias, which allows us to characterize (un)impactful misspecification directions, and provide further geometric understanding of the variance.
Fast Adversarial Training against Sparse Attacks Requires Loss Smoothing
Zhong, Xuyang, Huang, Yixiao, Liu, Chen
This paper studies fast adversarial training against sparse adversarial perturbations bounded by $l_0$ norm. We demonstrate the challenges of employing $1$-step attacks on $l_0$ bounded perturbations for fast adversarial training, including degraded performance and the occurrence of catastrophic overfitting (CO). We highlight that CO in $l_0$ adversarial training is caused by sub-optimal perturbation locations of $1$-step attack. Theoretical and empirical analyses reveal that the loss landscape of $l_0$ adversarial training is more craggy compared to its $l_\infty$, $l_2$ and $l_1$ counterparts. Moreover, we corroborate that the craggy loss landscape can aggravate CO. To address these issues, we propose Fast-LS-$l_0$ that incorporates soft labels and the trade-off loss function to smooth the adversarial loss landscape. Extensive experiments demonstrate our method can overcome the challenge of catastrophic overfitting, achieve state-of-the-art performance, and narrow down the performance gap between $1$-step and multi-step adversarial training against sparse attacks.
Proactive and Reactive Constraint Programming for Stochastic Project Scheduling with Maximal Time-Lags
Houten, Kim van den, Planken, Lรฉon, Freydell, Esteban, Tax, David M. J., de Weerdt, Mathijs
This study investigates scheduling strategies for the stochastic resource-constrained project scheduling problem with maximal time lags (SRCPSP/max)). Recent advances in Constraint Programming (CP) and Temporal Networks have reinvoked interest in evaluating the advantages and drawbacks of various proactive and reactive scheduling methods. First, we present a new, CP-based fully proactive method. Second, we show how a reactive approach can be constructed using an online rescheduling procedure. A third contribution is based on partial order schedules and uses Simple Temporal Networks with Uncertainty (STNUs). Our statistical analysis shows that the STNU-based algorithm performs best in terms of solution quality, while also showing good relative offline and online computation time.
Closing the Gaps: Optimality of Sample Average Approximation for Data-Driven Newsvendor Problems
Lyu, Jiameng, Yuan, Shilin, Zhou, Bingkun, Zhou, Yuan
We study the regret performance of Sample Average Approximation (SAA) for data-driven newsvendor problems with general convex inventory costs. In literature, the optimality of SAA has not been fully established under both \alpha-global strong convexity and (\alpha,\beta)-local strong convexity (\alpha-strongly convex within the \beta-neighborhood of the optimal quantity) conditions. This paper closes the gaps between regret upper and lower bounds for both conditions. Under the (\alpha,\beta)-local strong convexity condition, we prove the optimal regret bound of \Theta(\log T/\alpha + 1/ (\alpha\beta)) for SAA. This upper bound result demonstrates that the regret performance of SAA is only influenced by \alpha and not by \beta in the long run, enhancing our understanding about how local properties affect the long-term regret performance of decision-making strategies. Under the \alpha-global strong convexity condition, we demonstrate that the worst-case regret of any data-driven method is lower bounded by \Omega(\log T/\alpha), which is the first lower bound result that matches the existing upper bound with respect to both parameter \alpha and time horizon T. Along the way, we propose to analyze the SAA regret via a new gradient approximation technique, as well as a new class of smooth inverted-hat-shaped hard problem instances that might be of independent interest for the lower bounds of broader data-driven problems.
New Sample Complexity Bounds for (Regularized) Sample Average Approximation in Several Heavy-Tailed, Non-Lipschitzian, and High-Dimensional Cases
We study the sample complexity of sample average approximation (SAA) and its simple variations, referred to as the regularized SAA (RSAA), in solving convex and strongly convex stochastic programming (SP) problems under heavy-tailed-ness, non-Lipschitz-ness, and/or high dimensionality. The presence of such irregularities underscores critical vacua in the literature. In response, this paper presents three sets of results: First, we show that the (R)SAA is effective even if the objective function is not necessarily Lipschitz and the underlying distribution admits some bounded central moments only at (near-)optimal solutions. Second, when the SP's objective function is the sum of a smooth term and a Lipschitz term, we prove that the (R)SAA's sample complexity is completely independent from any complexity measures (e.g., the covering number) of the feasible region. Third, we explicate the (R)SAA's sample complexities with regard to the dependence on dimensionality $d$: When some $p$th ($p\geq 2$) central moment of the underlying distribution is bounded, we show that the required sample size grows at a rate no worse than $\mathcal O\left(p d^{2/p}\right)$ under any one of the three structural assumptions: (i) strong convexity w.r.t. the $q$-norm ($q\geq 1$); (ii) the combination of restricted strong convexity and sparsity; and (iii) a dimension-insensitive $q$-norm of an optimal solution. In both cases of (i) and (iii), it is further required that $p\leq q/(q-1)$. As a direct implication, the (R)SAA's complexity becomes (poly-)logarithmic in $d$, whenever $p\geq c\cdot \ln d$ is admissible for some constant $c>0$. These new results deviate from the SAA's typical sample complexities that grow polynomially with $d$. Part of our proof is based on the average-replace-one (RO) stability, which appears to be novel for the (R)SAA's analyses.
Black Box Variational Inference with a Deterministic Objective: Faster, More Accurate, and Even More Black Box
Giordano, Ryan, Ingram, Martin, Broderick, Tamara
Automatic differentiation variational inference (ADVI) offers fast and easy-to-use posterior approximation in multiple modern probabilistic programming languages. However, its stochastic optimizer lacks clear convergence criteria and requires tuning parameters. Moreover, ADVI inherits the poor posterior uncertainty estimates of mean-field variational Bayes (MFVB). We introduce "deterministic ADVI" (DADVI) to address these issues. DADVI replaces the intractable MFVB objective with a fixed Monte Carlo approximation, a technique known in the stochastic optimization literature as the "sample average approximation" (SAA). By optimizing an approximate but deterministic objective, DADVI can use off-the-shelf second-order optimization, and, unlike standard mean-field ADVI, is amenable to more accurate posterior covariances via linear response (LR). In contrast to existing worst-case theory, we show that, on certain classes of common statistical problems, DADVI and the SAA can perform well with relatively few samples even in very high dimensions, though we also show that such favorable results cannot extend to variational approximations that are too expressive relative to mean-field ADVI. We show on a variety of real-world problems that DADVI reliably finds good solutions with default settings (unlike ADVI) and, together with LR covariances, is typically faster and more accurate than standard ADVI.