Goto

Collaborating Authors

 Optimization



SoLar: Sinkhorn Label Refinery for Imbalanced Partial-Label Learning (Appendix) Haobo Wang

Neural Information Processing Systems

A.1 Proof of Theorem 1 First, we provide the following lemma to show the consistency of the standard cross-entropy loss. Now, we provide the main proof sketch for Theorem 1. Note that we always seek an optimal joint probability matrix before model training, which is mainly designed for empirical measures of the data samples. At a population level, we aim to search for an optimal probability measure that meets the marginal constraints and candidate constraints. Recall that Eq. (1) is a standard linear programming (LP) problem, and can be solved in polynomial LP solvers typically become time-consuming.


Non-Convex Bilevel Games with Critical Point Selection Maps

Neural Information Processing Systems

Bilevel optimization problems involve two nested objectives, where an upper-level objective depends on a solution to a lower-level problem. When the latter is non-convex, multiple critical points may be present, leading to an ambiguous definition of the problem. In this paper, we introduce a key ingredient for resolving this ambiguity through the concept of a selection map which allows one to choose a particular solution to the lower-level problem. Using such maps, we define a class of hierarchical games between two agents that resolve the ambiguity in bilevel problems. This new class of games requires introducing new analytical tools in Morse theory to extend implicit differentiation, a technique used in bilevel optimization resulting from the implicit function theorem. In particular, we establish the validity of such a method even when the latter theorem is inapplicable due to degenerate critical points. Finally, we show that algorithms for solving bilevel problems based on unrolled optimization solve these games up to approximation errors due to finite computational power. A simple correction to these algorithms is then proposed for removing these errors.




Regret minimization in Linear Bandits with offline data via extended D-optimal exploration

arXiv.org Machine Learning

We consider the problem of online regret minimization in linear bandits with access to prior observations (offline data) from the underlying bandit model. There are numerous applications where extensive offline data is often available, such as in recommendation systems, online advertising. Consequently, this problem has been studied intensively in recent literature. Our algorithm, Offline-Online Phased Elimination (OOPE), effectively incorporates the offline data to substantially reduce the online regret compared to prior work. To leverage offline information prudently, OOPE uses an extended D-optimal design within each exploration phase. OOPE achieves an online regret is $\tilde{O}(\sqrt{\deff T \log \left(|\mathcal{A}|T\right)}+d^2)$. $\deff \leq d)$ is the effective problem dimension which measures the number of poorly explored directions in offline data and depends on the eigen-spectrum $(ฮป_k)_{k \in [d]}$ of the Gram matrix of the offline data. The eigen-spectrum $(ฮป_k)_{k \in [d]}$ is a quantitative measure of the \emph{quality} of offline data. If the offline data is poorly explored ($\deff \approx d$), we recover the established regret bounds for purely online setting while, when offline data is abundant ($\Toff >> T$) and well-explored ($\deff = o(1) $), the online regret reduces substantially. Additionally, we provide the first known minimax regret lower bounds in this setting that depend explicitly on the quality of the offline data. These lower bounds establish the optimality of our algorithm in regimes where offline data is either well-explored or poorly explored. Finally, by using a Frank-Wolfe approximation to the extended optimal design we further improve the $O(d^{2})$ term to $O\left(\frac{d^{2}}{\deff} \min \{ \deff,1\} \right)$, which can be substantial in high dimensions with moderate quality of offline data $\deff = ฮฉ(1)$.


Objective Soups: Multilingual Multi-Task Modeling for Speech Processing

arXiv.org Machine Learning

Training a single model for multilingual, multi-task speech processing (MSP) is severely hampered by conflicting objectives between tasks like speech recognition and translation. While multi-objective optimization (MOO) aims to align gradient updates, its effectiveness diminishes as the number of tasks grows, making it difficult to find a common descent direction. This raises a fundamental question: should highly conflicting objectives be optimized jointly or separated into a hierarchical structure? To address this question, this paper investigates three multi-objective MSP formulations, which we refer to as \textbf{objective soup recipes}. These formulations apply multi-objective optimization at different optimization levels to mitigate potential conflicts among all objectives. To ensure efficiency, we introduce a lightweight layer-selection mechanism that computes the conflict-avoiding gradient using only the most problematic layers, minimizing computational and memory overhead. Extensive experiments on CoVoST v2, LibriSpeech, and AISHELL-1 reveal that a bi-level recipe separating recognition and translation tasks consistently outperforms standard flat optimization. Our work demonstrates that hierarchical MOO is a more effective and scalable approach for building state-of-the-art MSP models. Our code has been released at https://github.com/afmsaif/Objective_Soups.


Distributional Sensitivity Analysis: Enabling Differentiability in Sample-Based Inference

arXiv.org Machine Learning

We present two analytical formulae for estimating the sensitivity -- namely, the gradient or Jacobian -- at given realizations of an arbitrary-dimensional random vector with respect to its distributional parameters. The first formula interprets this sensitivity as partial derivatives of the inverse mapping associated with the vector of 1-D conditional distributions. The second formula, intended for optimization methods that tolerate inexact gradients, introduces a diagonal approximation that reduces computational cost at the cost of some accuracy. We additionally provide four second-order numerical algorithms to approximate both formulae when closed forms are unavailable. We performed verification and validation studies to demonstrate the correctness of these numerical algorithms and the effectiveness of the proposed formulae. A nuclear physics application showcases how our work enables uncertainty quantification and parameter inference for quantum correlation functions. Our approach differs from existing methods by avoiding the need for model fitting, knowledge of sampling algorithms, and evaluation of high-dimensional integrals. It is therefore particularly useful for sample-based inverse problems when the sampler operates as a black box or requires expensive physics simulations. Moreover, our method renders arbitrary sampling subroutines differentiable, facilitating their integration into programming frameworks for deep learning and automatic differentiation. Algorithmic details and code implementations are provided in this paper and in our open-source software DistroSA to enable reproducibility and further development.


FairPOT: Balancing AUC Performance and Fairness with Proportional Optimal Transport

arXiv.org Machine Learning

Fairness metrics utilizing the area under the receiver operator characteristic curve (AUC) have gained increasing attention in high-stakes domains such as healthcare, finance, and criminal justice. In these domains, fairness is often evaluated over risk scores rather than binary outcomes, and a common challenge is that enforcing strict fairness can significantly degrade AUC performance. To address this challenge, we propose Fair Proportional Optimal Transport (FairPOT), a novel, model-agnostic post-processing framework that strategically aligns risk score distributions across different groups using optimal transport, but does so selectively by transforming a controllable proportion, i.e., the top-lambda quantile, of scores within the disadvantaged group. By varying lambda, our method allows for a tunable trade-off between reducing AUC disparities and maintaining overall AUC performance. Furthermore, we extend FairPOT to the partial AUC setting, enabling fairness interventions to concentrate on the highest-risk regions. Extensive experiments on synthetic, public, and clinical datasets show that FairPOT consistently outperforms existing post-processing techniques in both global and partial AUC scenarios, often achieving improved fairness with slight AUC degradation or even positive gains in utility. The computational efficiency and practical adaptability of FairPOT make it a promising solution for real-world deployment.


Online Safety under Multiple Constraints and Input Bounds using gatekeeper: Theory and Applications

arXiv.org Artificial Intelligence

NCREASING use of robotic systems in real-world applications necessitates advanced controllers that ensure safety, robustness, and effectiveness in human-machine teaming [1]. This letter formalizes and builds upon our recent work on online safety verification and control [2], which introduces gatekeeper as a novel algorithmic component between the planner and the controller of the autonomous system. To briefly illustrate the principle behind gatekeeper, consider a Unmanned Aerial V ehicle (UA V) navigating an unknown environment. The UA V follows a nominal trajectory, generated by its planner and tracked by its controller. At each iteration, gatekeeper performs two key steps: (i) it evaluates the currently known safe set (derived from onboard sensing), and a backup set, which represents a region the UA V can retreat to if the nominal trajectory is predicted to exit the safe set in the future; (ii) it constructs a candidate trajectory by stitching together the nominal trajectory (up to a future time horizon) and a backup trajectory that leads safely into the backup set. The authors would like to acknowledge the support of the National Science Foundation (NSF) under grant no.