inaddition
Multistage Conditional Compositional Optimization
Şen, Buse, Hu, Yifan, Kuhn, Daniel
We introduce Multistage Conditional Compositional Optimization (MCCO) as a new paradigm for decision-making under uncertainty that combines aspects of multistage stochastic programming and conditional stochastic optimization. MCCO minimizes a nest of conditional expectations and nonlinear cost functions. It has numerous applications and arises, for example, in optimal stopping, linear-quadratic regulator problems, distributionally robust contextual bandits, as well as in problems involving dynamic risk measures. The naïve nested sampling approach for MCCO suffers from the curse of dimensionality familiar from scenario tree-based multistage stochastic programming, that is, its scenario complexity grows exponentially with the number of nests. We develop new multilevel Monte Carlo techniques for MCCO whose scenario complexity grows only polynomially with the desired accuracy.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > Switzerland (0.04)
- Europe > France (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Demographic Parity Tails for Regression
Le, Naht Sinh, Denis, Christophe, Hebiri, Mohamed
Demographic parity (DP) is a widely studied fairness criterion in regression, enforcing independence between the predictions and sensitive attributes. However, constraining the entire distribution can degrade predictive accuracy and may be unnecessary for many applications, where fairness concerns are localized to specific regions of the distribution. To overcome this issue, we propose a new framework for regression under DP that focuses on the tails of target distribution across sensitive groups. Our methodology builds on optimal transport theory. By enforcing fairness constraints only over targeted regions of the distribution, our approach enables more nuanced and context-sensitive interventions. Leveraging recent advances, we develop an interpretable and flexible algorithm that leverages the geometric structure of optimal transport. We provide theoretical guarantees, including risk bounds and fairness properties, and validate the method through experiments in regression settings.
- North America > United States > California (0.04)
- Europe > France (0.04)
Self-Regularized Learning Methods
Schölpple, Max, Fanghui, Liu, Steinwart, Ingo
We introduce a general framework for analyzing learning algorithms based on the notion of self-regularization, which captures implicit complexity control without requiring explicit regularization. This is motivated by previous observations that many algorithms, such as gradient-descent based learning, exhibit implicit regularization. In a nutshell, for a self-regularized algorithm the complexity of the predictor is inherently controlled by that of the simplest comparator achieving the same empirical risk. This framework is sufficiently rich to cover both classical regularized empirical risk minimization and gradient descent. Building on self-regularization, we provide a thorough statistical analysis of such algorithms including minmax-optimal rates, where it suffices to show that the algorithm is self-regularized -- all further requirements stem from the learning problem itself. Finally, we discuss the problem of data-dependent hyperparameter selection, providing a general result which yields minmax-optimal rates up to a double logarithmic factor and covers data-driven early stopping for RKHS-based gradient descent.
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Asia > Singapore (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Illinois (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
3d779cae2d46cf6a8a99a35ba4167977-AuthorFeedback.pdf
Our approach is purely based on 2D convolutions. Nevertheless, it3 outperforms or performs comparably to many more costly 3D models. We thank the reviewers for pointing out some related (or missing) references. The12 Timeception layers involve group convolutions at different time scales while our TAM layers only use depthwise13 convolution. As a result, the Timeception has significantly more parameters than the TAM (10% vs. 0.1% of the14 totalmodelparameters).
6 SupplementaryMaterial
The original CLUTRR data generation framework made sure that each testproof is not in the training set in order to test whether a model is able to generalize to unseen proofs. Initial results on the original CLUTRR test sets resulted in strong model performance ( 99%) on levels seen during training (2, 4, 6) but no generalization at all ( 0%) to other levels. The models are given as input "
- North America > United States > New York > New York County > New York City (0.05)
- Europe > Sweden > Stockholm > Stockholm (0.05)
- Asia > Middle East > Jordan (0.05)
- (6 more...)
- North America > United States > Virginia (0.04)
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
- Europe > France (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)