objective value
BOOOM: Loss-Function-Agnostic Black-Box Optimization over Orthonormal Manifolds for Machine Learning and Statistical Inference
Kim, Beomchang, Roy, Subhrajyoty, Das, Priyam
Optimization over the Stiefel manifold $\mathrm{St}(p,d)$, the set of $p \times d$ column-orthonormal matrices, is fundamental in statistics, machine learning, and scientific computing, yet remains challenging in the presence of non-convex, non-smooth, or black-box objectives. Existing methods largely rely on either convex relaxations or gradient-based Riemannian optimization, limiting applicability in derivative-free and highly multimodal settings. We propose \textsc{BOOOM} (Black-box Optimization Over Orthonormal Manifolds), a general-purpose framework for loss-function-agnostic optimization on $\mathrm{St}(p,d)$. The key idea is a global Givens rotation-based parametrization that maps the manifold to an unconstrained Euclidean angle space while preserving feasibility exactly. Building on this representation, BOOOM employs a structured, parallelizable, derivative-free search based on Recursive Modified Pattern Search, enabling systematic exploration through plane-wise rotations without requiring gradient information and facilitating escape from poor local optima. We establish a unified theoretical framework showing equivalence between angle-space and manifold optimization, transfer of stationarity, and global convergence in probability under mild conditions. Empirical results across diverse problems, including heterogeneous quadratic optimization, low-rank and sparse matrix decomposition, independent component analysis, and orthogonal joint diagonalization, among other widely studied settings, demonstrate strong performance relative to state-of-the-art methods, particularly in non-smooth and highly multimodal regimes. We further illustrate its practical utility through a novel supervised PCA formulation applied to metabolomics data in colorectal cancer.
statements and
Let a two-player Markov game where both players affect the transition. We will effectively show that the problem of best-responding to a correlated policy ฯ is526 equivalent to best-responding to the marginal policy of ฯ for the opponent. The proof follows from527 the equivalence of the two MDPs.528 Before that, given a (possibly correlated) joint policy ฯ we define a nonlinear program, (PBR), whose539 optimal solutions are best-response policies of each agent k to ฯ k and the values for each state s540 and timestep h:541 A.2 Proof of Theorem 3.2542 The best-response program. First, we state the following lemma that will prove useful for several543 of our arguments,544 Lemma A.1 (Best-response LP).
Supplementary Information: TARTARUS: Practical and Realistic Benchmarks for Inverse Molecular Design
S1. INTRODUCTION Traditionally, property-guided optimization has relied on expert intuition [1] and several rounds of trial, error, and human-inspired optimization, occasionally supported by computer simulations. Alternatively, computer-assisted approaches have employed virtual screening [2] or optimization algorithms such as genetic algorithms (GAs) [3-5]. More recently, with the surge of deep learning, deep generative models have emerged, specifically designed to operate in chemical space and tackle inverse molecular design [6-8]. This has led to the development of numerous algorithmic approaches for this purpose, with the most popular including variational autoencoders (VAEs) [9, 10], generative adversarial networks (GANs) [11, 12], and reinforcement learning (RL) [13, 14]. METHODSOVERVIEW In this section, we provide an overview of the molecular generative models employed throughout this work and summarize the associated design choices we needed to make during their implementation. The molecular design algorithms we considered are VAEs, long short-term memory hill climbing (LSTM-HC) models [15-17], REINVENT [18], JANUS [19], and a graph-based genetic algorithm (GB-GA) [20]. At the core of the majority of these approaches are molecular string representations, the most commonly used of which is the Simplified Molecular Input Line Entry System (SMILES) [21]. Accordingly, many of the algorithms tested rely on predicting subsequent characters from partial strings to propose structures. However, algorithms based on SMILES can regularly produce invalid strings that do not represent molecules, which is problematic both in terms of robustness and interpretability of the corresponding methodologies [22, 23]. Consequently, this issue was addressed systematically by introducing Self-Referencing Embedded Strings (SELFIES) [22], a molecular string representation that guarantees validity. Thus, unlike for SMILES, every arbitrary combination of SELFIES characters represents a molecule. Nevertheless, its impact on structure optimization has not yet been evaluated systematically [23]. To address this issue, we modify some of the existing generative models relying on SMILES to be also compatible with SELFIES and test their performance depending on representation, similar to how it has been done recently [24]. Among the models tested, REINVENT, the VAEs, and the LSTM-HC models use recurrent neural networks (RNNs) [25] to learn the conditional probability distributions of the characters that represent molecules. RNNs are a class of artificial neural networks (ANNs) that utilize sequential information from their previous predictions and states.
The Condition-Number Principle for Prototype Clustering
We develop a geometric framework that links objective accuracy to structural recovery in prototype-based clustering. The analysis is algorithm-agnostic and applies to a broad class of admissible loss functions. We define a clustering condition number that compares within-cluster scale to the minimum loss increase required to move a point across a cluster boundary. When this quantity is small, any solution with a small suboptimality gap must also have a small misclassification error relative to a benchmark partition. The framework also clarifies a fundamental trade-off between robustness and sensitivity to cluster imbalance, leading to sharp phase transitions for exact recovery under different objectives. The guarantees are deterministic and non-asymptotic, and they separate the role of algorithmic accuracy from the intrinsic geometric difficulty of the instance. We further show that errors concentrate near cluster boundaries and that sufficiently deep cluster cores are recovered exactly under strengthened local margins. Together, these results provide a geometric principle for interpreting low objective values as reliable evidence of meaningful clustering structure.