Goto

Collaborating Authors

 Optimization


Learning Dynamics from Infrequent Output Measurements for Uncertainty-Aware Optimal Control

arXiv.org Artificial Intelligence

Abstract: Reliable optimal control is challenging when the dynamics of a nonlinear system are unknown and only infrequent, noisy output measurements are available. This work addresses this setting of limited sensing by formulating a Bayesian prior over the continuous-time dynamics and latent state trajectory in state-space form and updating it through a targeted marginal Metropolis-Hastings sampler equipped with a numerical ODE integrator. The resulting posterior samples are used to formulate a scenario-based optimal control problem that accounts for both model and measurement uncertainty and is solved using standard nonlinear programming methods. The approach is validated in a numerical case study on glucose regulation using a Type 1 diabetes model. Keywords: Probabilistic and Bayesian methods for system identification, Nonlinear system identification, Time series modeling, Statistical inference, Learning methods for optimal control, Model predictive control, Data-driven control theory 1. INTRODUCTION Accurate dynamical models are fundamental for the predictive and optimal control of nonlinear systems. Although first-principles models may describe the general structure of many systems, important parameters or effects often remain unknown, limiting their direct use for control.


Sparse Variable Projection in Robotic Perception: Exploiting Separable Structure for Efficient Nonlinear Optimization

arXiv.org Artificial Intelligence

Robotic perception often requires solving large nonlinear least-squares (NLS) problems. While sparsity has been well-exploited to scale solvers, a complementary and underexploited structure is \emph{separability} -- where some variables (e.g., visual landmarks) appear linearly in the residuals and, for any estimate of the remaining variables (e.g., poses), have a closed-form solution. Variable projection (VarPro) methods are a family of techniques that exploit this structure by analytically eliminating the linear variables and presenting a reduced problem in the remaining variables that has favorable properties. However, VarPro has seen limited use in robotic perception; a major challenge arises from gauge symmetries (e.g., cost invariance to global shifts and rotations), which are common in perception and induce specific computational challenges in standard VarPro approaches. We present a VarPro scheme designed for problems with gauge symmetries that jointly exploits separability and sparsity. Our method can be applied as a one-time preprocessing step to construct a \emph{matrix-free Schur complement operator}. This operator allows efficient evaluation of costs, gradients, and Hessian-vector products of the reduced problem and readily integrates with standard iterative NLS solvers. We provide precise conditions under which our method applies, and describe extensions when these conditions are only partially met. Across synthetic and real benchmarks in SLAM, SNL, and SfM, our approach achieves up to \textbf{2$\times$--35$\times$ faster runtimes} than state-of-the-art methods while maintaining accuracy. We release an open-source C++ implementation and all datasets from our experiments.


Towards symbolic regression for interpretable clinical decision scores

arXiv.org Artificial Intelligence

Medical decision-making makes frequent use of algorithms that combine risk equations with rules, providing clear and standardized treatment pathways. Symbolic regression (SR) traditionally limits its search space to continuous function forms and their parameters, making it difficult to model this decision-making. However, due to its ability to derive data-driven, interpretable models, SR holds promise for developing data-driven clinical risk scores. To that end we introduce Brush, an SR algorithm that combines decision-tree-like splitting algorithms with non-linear constant optimization, allowing for seamless integration of rule-based logic into symbolic regression and classification models. Brush achieves Pareto-optimal performance on SRBench, and was applied to recapitulate two widely used clinical scoring systems, achieving high accuracy and interpretable models. Compared to decision trees, random forests, and other SR methods, Brush achieves comparable or superior predictive performance while producing simpler models.


Batch Acquisition Function Evaluations and Decouple Optimizer Updates for Faster Bayesian Optimization

arXiv.org Artificial Intelligence

Bayesian optimization (BO) efficiently finds high-performing parameters by maximizing an acquisition function, which models the promise of parameters. A major computational bottleneck arises in acquisition function optimization, where multi-start optimization (MSO) with quasi-Newton (QN) methods is required due to the non-convexity of the acquisition function. BoTorch, a widely used BO library, currently optimizes the summed acquisition function over multiple points, leading to the speedup of MSO owing to Py-Torch batching. Nevertheless, this paper empirically demonstrates the suboptimality of this approach in terms of off-diagonal approximation errors in the inverse Hessian of a QN method, slowing down its convergence. To address this problem, we propose to decouple QN updates using a coroutine while batching the acquisition function calls. Our approach not only yields the theoretically identical convergence to the sequential MSO but also drastically reduces the wall-clock time compared to the previous approaches. Our approach is available in GPSampler in Optuna, effectively reducing its computational overhead.


Permutation-Invariant Representation Learning for Robust and Privacy-Preserving Feature Selection

arXiv.org Artificial Intelligence

Abstract--Feature selection eliminates redundancy among features to improve downstream task performance while reducing computational overhead. Existing methods often struggle to capture intricate feature interactions and adapt across diverse application scenarios. Recent advances employ generative intelligence to alleviate these drawbacks. However, these methods remain constrained by permutation sensitivity in embedding and reliance on convexity assumptions in gradient-based search. T o address these limitations, our initial work introduces a novel framework that integrates permutation-invariant embedding with policy-guided search. Although effective, it still left opportunities to adapt to realistic distributed scenarios. In practice, data across local clients is highly imbalanced, heterogeneous and constrained by strict privacy regulations, limiting direct sharing. These challenges highlight the need for a framework that can integrate feature selection knowledge across clients without exposing sensitive information. In this extended journal version, we advance the framework from two perspectives: 1) developing a privacy-preserving knowledge fusion strategy to derive a unified representation space without sharing sensitive raw data. The results further demonstrate its strong generalization ability in federated learning scenarios. The code and data are publicly available https://anonymous.4open.science/r/FedCAPS-08BF. Index T erms--Automated Feature Selection; Representation Learning; Reinforcement Learning, Federated Learning. EA TURE selection removes redundant and irrelevant features to improve both predictive performance and computational efficiency in downstream tasks. Despite the growing dominance of deep learning, feature selection remains indispensable in scenarios characterized by high-dimensional data, the need for interpretability, and limited resource constraints.


Universal Representation of Generalized Convex Functions and their Gradients

arXiv.org Artificial Intelligence

A wide range of optimization problems can often be written in terms of generalized convex functions (GCFs). When this structure is present, it can convert certain nested bilevel objectives into single-level problems amenable to standard first-order optimization methods. We provide a new differentiable layer with a convex parameter space and show (Theorems 5.1 and 5.2) that it and its gradient are universal approximators for GCFs and their gradients. We demonstrate how this parameterization can be leveraged in practice by (i) learning optimal transport maps with general cost functions and (ii) learning optimal auctions of multiple goods. In both these cases, we show how our layer can be used to convert the existing bilevel or min-max formulations into single-level problems that can be solved efficiently with first-order methods.


A Dynamically Weighted ADMM Framework for Byzantine Resilience

arXiv.org Artificial Intelligence

The alternating direction of multipliers method (ADMM) is a popular method to solve distributed consensus optimization utilizing efficient communication among various nodes in the network. However, in the presence of faulty or attacked nodes, even a small perturbation (or sharing false data) during the communication can lead to divergence of the solution. To address this issue, in this work we consider ADMM under the effect of Byzantine threat, where an unknown subset of nodes is subject to Byzantine attacks or faults. We propose Dynamically Weighted ADMM (DW-ADMM), a novel variant of ADMM that uses dynamic weights on the edges of the network, thus promoting resilient distributed optimization. We establish that the proposed method (i) produces a nearly identical solution to conventional ADMM in the error-free case, and (ii) guarantees a bounded solution with respect to the global minimizer, even under Byzantine threat. Finally, we demonstrate the effectiveness of our proposed algorithm using an illustrative numerical simulation.


Optimizing Optimizers for Fast Gradient-Based Learning

arXiv.org Machine Learning

We lay the theoretical foundation for automating optimizer design in gradient-based learning. Based on the greedy principle, we formulate the problem of designing optimizers as maximizing the instantaneous decrease in loss. By treating an optimizer as a function that translates loss gradient signals into parameter motions, the problem reduces to a family of convex optimization problems over the space of optimizers. Solving these problems under various constraints not only recovers a wide range of popular optimizers as closed-form solutions, but also produces the optimal hyperparameters of these optimizers with respect to the problems at hand. This enables a systematic approach to design optimizers and tune their hyperparameters according to the gradient statistics that are collected during the training process. Furthermore, this optimization of optimization can be performed dynamically during training. Just as optimizers train their models by feeding them parameter velocities θ, models can also fit the optimizers to the underlying tasks by feeding gradients g. We are interested in the problem of designing optimiz-ers that maximize the utility of gradient-based learning for a given task. The process of learning manifests as the parameter motion θ driven by the gradient force g applied at each step t. Physics requires a constitutive law that relates kinematic motion to its motive force. In gradient-based learning, optimizers take that role. We can represent an optimizer as a positive semidefinite operator Q 0 that linearly translates the gradients into the parameter updates, θ = Q g. (1) Later sections will reveal that many existing optimizers fall into this category. Q g. (2) Adhering to the greedy paradigm, we turn our original problem of maximizing the utility of learning into a different optimization problem that maximizes this loss drop with respect to the optimizer Q: maximize Problem P1 reveals two design options that bound this maximum: (1) the trust region implied by the feasible set Q Q, and (2) the gradient distribution under the expectation E. Our main focus is on how these two factors determine the optimal optimizer Q Optimizers and their hyperparameters can be dynamically tuned or even be replaced by better ones according to the intermediate probes from the gradients in the middle of training. By reverse engineering commonly used optimizers, we draw the landscape of optimizers that have driven the success of machine learning (Robbins & Monro, 1951; Kingma & Ba, 2015; Loshchilov & Hutter, 2019; Gupta et al., 2018; Martens & Grosse, 2015) into a single picture. This lets us better use the well-studied optimizers in practice and also suggest extensions to them. Note that Σ is a symmetric and positive semidefinite (PSD) matrix of shape d d.


Contextual Strongly Convex Simulation Optimization: Optimize then Predict with Inexact Solutions

arXiv.org Machine Learning

In this work, we study contextual strongly convex simulation optimization and adopt an "optimize then predict" (OTP) approach for real-time decision making. In the offline stage, simulation optimization is conducted across a set of covariates to approximate the optimal-solution function; in the online stage, decisions are obtained by evaluating this approximation at the observed covariate. The central theoretical challenge is to understand how the inexactness of solutions generated by simulation-optimization algorithms affects the optimality gap, which is overlooked in existing studies. To address this, we develop a unified analysis framework that explicitly accounts for both solution bias and variance. Using Polyak-Ruppert averaging SGD as an illustrative simulation-optimization algorithm, we analyze the optimality gap of OTP under four representative smoothing techniques: $k$ nearest neighbor, kernel smoothing, linear regression, and kernel ridge regression. We establish convergence rates, derive the optimal allocation of the computational budget $Γ$ between the number of design covariates and the per-covariate simulation effort, and demonstrate the convergence rate can approximately achieve $Γ^{-1}$ under appropriate smoothing technique and sample-allocation rule. Finally, through a numerical study, we validate the theoretical findings and demonstrate the effectiveness and practical value of the proposed approach.


OptMap: Geometric Map Distillation via Submodular Maximization

arXiv.org Artificial Intelligence

Abstract--Autonomous robots rely on geometric maps to inform a diverse set of perception and decision-making algorithms. As autonomy requires reasoning and planning on multiple scales of the environment, each algorithm may require a different map for optimal performance. Light Detection And Ranging (LiDAR) sensors generate an abundance of geometric data to satisfy these diverse requirements, but selecting informative, size-constrained maps is computationally challenging as it requires solving an NP-hard combinatorial optimization. In this work we present OptMap: a geometric map distillation algorithm which achieves real-time, application-specific map generation via multiple theoretical and algorithmic innovations. A central feature is the maximization of set functions that exhibit diminishing returns, i.e., submodularity, using polynomial-time algorithms with provably near-optimal solutions. We formulate a novel submodular reward function which quantifies informativeness, reduces input set sizes, and minimizes bias in sequentially collected datasets. Further, we propose a dynamically reordered streaming submod-ular algorithm which improves empirical solution quality and addresses input order bias via an online approximation of the value of all scans. T esting was conducted on open-source and custom datasets with an emphasis on long-duration mapping sessions, highlighting OptMap's minimal computation requirements. Open-source ROS1 and ROS2 packages are available and can be used alongside any LiDAR SLAM algorithm. ODERN autonomous systems use a modular software architecture with separate algorithms for perceiving the environment, planning collision-free paths, estimating vehicle motion, and making higher-level decisions to complete their tasks. Many of these algorithms depend on geometric information about the environment to function properly. As a result, their performance and processing time can vary greatly depending on the quality of the geometric data. For example, trajectory planners use geometric maps to plan collision-free paths, but the density of geometric data is critical for balancing real-time replanning requirements against reliable collision detection. This trade-off is best served by dense geometric maps that specifically capture the intended trajectory corridor (Figure 1a). In contrast, localization entails aligning a source and reference point cloud, a process best served by using a sparse and global reference point could to minimize computation time while maximizing alignment accuracy (Figure 1b). Distribution Statement A: Approved for public release; distribution is unlimited. Map is dense while remaining efficient as only points near the intended trajectory are returned.