Goto

Collaborating Authors

 Optimization


Feature Optimization for Time Series Forecasting via Novel Randomized Uphill Climbing

arXiv.org Artificial Intelligence

Randomized Uphill Climbing (RUC) is a lightweight, stochastic search heuristic that has delivered state-of-the-art equity "alpha" factors for quantitative hedge funds. I propose to generalize RUC into a model-agnostic feature optimization framework for multivariate time-series forecasting. The core idea is to (i) synthesize candidate feature programs by randomly composing operators from a domain-specific grammar, (ii) score candidates rapidly with inexpensive surrogate models (OLS/Poisson) on rolling windows, and (iii) filter instability via nested cross-validation and information-theoretic shrinkage. By decoupling feature discovery from GPU-heavy deep learning, the method promises faster iteration cycles, lower energy consumption, and greater interpretability. Societal relevance: accurate, transparent forecasting tools empower resource-constrained institutions, energy regulators, climate-risk NGOs--to make data-driven decisions without proprietary black-box models.


Learning based convex approximation for constrained parametric optimization

arXiv.org Artificial Intelligence

We propose an input convex neural network (ICNN)-based self-supervised learning framework to solve continuous constrained optimization problems. By integrating the augmented Lagrangian method (ALM) with the constraint correction mechanism, our framework ensures \emph{non-strict constraint feasibility}, \emph{better optimality gap}, and \emph{best convergence rate} with respect to the state-of-the-art learning-based methods. We provide a rigorous convergence analysis, showing that the algorithm converges to a Karush-Kuhn-Tucker (KKT) point of the original problem even when the internal solver is a neural network, and the approximation error is bounded. We test our approach on a range of benchmark tasks including quadratic programming (QP), nonconvex programming, and large-scale AC optimal power flow problems. The results demonstrate that compared to existing solvers (e.g., \texttt{OSQP}, \texttt{IPOPT}) and the latest learning-based methods (e.g., DC3, PDL), our approach achieves a superior balance among accuracy, feasibility, and computational efficiency.


Theoretical Guarantees for LT-TTD: A Unified Transformer-based Architecture for Two-Level Ranking Systems

arXiv.org Machine Learning

Modern recommendation and search systems typically employ multi-stage ranking architectures to efficiently handle billions of candidates. The conventional approach uses distinct L1 (candidate retrieval) and L2 (re-ranking) models with different optimization objectives, introducing critical limitations including irreversible error propagation and suboptimal ranking. This paper identifies and analyzes the fundamental limitations of this decoupled paradigm and proposes LT-TTD (Listwise Transformer with Two-Tower Distillation), a novel unified architecture that bridges retrieval and ranking phases. Our approach combines the computational efficiency of two-tower models with the expressivity of transformers in a unified listwise learning framework. We provide a comprehensive theoretical analysis of our architecture and establish formal guarantees regarding error propagation mitigation, ranking quality improvements, and optimization convergence. We derive theoretical bounds showing that LT-TTD reduces the upper limit on irretrievable relevant items by a factor that depends on the knowledge distillation strength, and prove that our multi-objective optimization framework achieves a provably better global optimum than disjoint training. Additionally, we analyze the computational complexity of our approach, demonstrating that the asymptotic complexity remains within practical bounds for real-world applications. We also introduce UPQE, a novel evaluation metric specifically designed for unified ranking architectures that holistically captures retrieval quality, ranking performance, and computational efficiency.


A Two-Timescale Primal-Dual Framework for Reinforcement Learning via Online Dual Variable Guidance

arXiv.org Artificial Intelligence

We study reinforcement learning by combining recent advances in regularized linear programming formulations with the classical theory of stochastic approximation. Motivated by the challenge of designing algorithms that leverage off-policy data while maintaining on-policy exploration, we propose PGDA-RL, a novel primal-dual Projected Gradient Descent-Ascent algorithm for solving regularized Markov Decision Processes (MDPs). PGDA-RL integrates experience replay-based gradient estimation with a two-timescale decomposition of the underlying nested optimization problem. The algorithm operates asynchronously, interacts with the environment through a single trajectory of correlated data, and updates its policy online in response to the dual variable associated with the occupation measure of the underlying MDP. We prove that PGDA-RL converges almost surely to the optimal value function and policy of the regularized MDP. Our convergence analysis relies on tools from stochastic approximation theory and holds under weaker assumptions than those required by existing primal-dual RL approaches, notably removing the need for a simulator or a fixed behavioral policy.


Clust-Splitter $-$ an Efficient Nonsmooth Optimization-Based Algorithm for Clustering Large Datasets

arXiv.org Artificial Intelligence

Clustering is a fundamental task in data mining and machine learning, particularly for analyzing large-scale data. In this paper, we introduce Clust-Splitter, an efficient algorithm based on nonsmooth optimization, designed to solve the minimum sum-of-squares clustering problem in very large datasets. The clustering task is approached through a sequence of three nonsmooth optimization problems: two auxiliary problems used to generate suitable starting points, followed by a main clustering formulation. To solve these problems effectively, the limited memory bundle method is combined with an incremental approach to develop the Clust-Splitter algorithm. We evaluate Clust-Splitter on real-world datasets characterized by both a large number of attributes and a large number of data points and compare its performance with several state-of-the-art large-scale clustering algorithms. Experimental results demonstrate the efficiency of the proposed method for clustering very large datasets, as well as the high quality of its solutions, which are on par with those of the best existing methods.


A Graphical Global Optimization Framework for Parameter Estimation of Statistical Models with Nonconvex Regularization Functions

arXiv.org Artificial Intelligence

Optimization problems with norm-bounding constraints arise in a variety of applications, including portfolio optimization, machine learning, and feature selection. A common approach to these problems involves relaxing the norm constraint via Lagrangian relaxation, transforming it into a regularization term in the objective function. A particularly challenging class includes the zero-norm function, which promotes sparsity in statistical parameter estimation. Most existing exact methods for solving these problems introduce binary variables and artificial bounds to reformulate them as higher-dimensional mixed-integer programs, solvable by standard solvers. Other exact approaches exploit specific structural properties of the objective, making them difficult to generalize across different problem types. Alternative methods employ nonconvex penalties with favorable statistical characteristics, but these are typically addressed using heuristic or local optimization techniques due to their structural complexity. In this paper, we propose a novel graph-based method to globally solve optimization problems involving generalized norm-bounding constraints. Our approach encompasses standard $\ell_p$-norms for $p \in [0, \infty)$ and nonconvex penalties such as SCAD and MCP. We leverage decision diagrams to construct strong convex relaxations directly in the original variable space, eliminating the need for auxiliary variables or artificial bounds. Integrated into a spatial branch-and-cut framework, our method guarantees convergence to the global optimum. We demonstrate its effectiveness through preliminary computational experiments on benchmark sparse linear regression problems involving complex nonconvex penalties, which are not tractable using existing global optimization techniques.


Large Language Model Compression with Global Rank and Sparsity Optimization

arXiv.org Artificial Intelligence

Low-rank and sparse composite approximation is a natural idea to compress Large Language Models (LLMs). However, such an idea faces two primary challenges that adversely affect the performance of existing methods. The first challenge relates to the interaction and cooperation between low-rank and sparse matrices, while the second involves determining weight allocation across different layers, as redundancy varies considerably among them. To address these challenges, we propose a novel two-stage LLM compression method with the capability of global rank and sparsity optimization. It is noteworthy that the overall optimization space is vast, making comprehensive optimization computationally prohibitive. Therefore, to reduce the optimization space, our first stage utilizes robust principal component analysis to decompose the weight matrices of LLMs into low-rank and sparse components, which span the low dimensional and sparse spaces containing the resultant low-rank and sparse matrices, respectively. In the second stage, we propose a probabilistic global optimization technique to jointly identify the low-rank and sparse structures within the above two spaces. The appealing feature of our approach is its ability to automatically detect the redundancy across different layers and to manage the interaction between the sparse and low-rank components. Extensive experimental results indicate that our method significantly surpasses state-of-the-art techniques for sparsification and composite approximation.


Soft yet Effective Robots via Holistic Co-Design

arXiv.org Artificial Intelligence

Soft robots promise inherent safety via their material compliance for seamless interactions with humans or delicate environments. Yet, their development is challenging because it requires integrating materials, geometry, actuation, and autonomy into complex mechatronic systems. Despite progress, the field struggles to balance task-specific performance with broader factors like durability and manufacturability - a difficulty that we find is compounded by traditional sequential design processes with their lack of feedback loops. In this perspective, we review emerging co-design approaches that simultaneously optimize the body and brain, enabling the discovery of unconventional designs highly tailored to the given tasks. We then identify three key shortcomings that limit the broader adoption of such co-design methods within the soft robotics domain. First, many rely on simulation-based evaluations focusing on a single metric, while real-world designs must satisfy diverse criteria. Second, current methods emphasize computational modeling without ensuring feasible realization, risking sim-to-real performance gaps. Third, high computational demands limit the exploration of the complete design space. Finally, we propose a holistic co-design framework that addresses these challenges by incorporating a broader range of design values, integrating real-world prototyping to refine evaluations, and boosting efficiency through surrogate metrics and model-based control strategies. This holistic framework, by simultaneously optimizing functionality, durability, and manufacturability, has the potential to enhance reliability and foster broader acceptance of soft robotics, transforming human-robot interactions.


PyRoki: A Modular Toolkit for Robot Kinematic Optimization

arXiv.org Artificial Intelligence

We unify problems like inverse kinematics, trajectory optimization, and motion retargeting using composable kinematic variables and costs. PyRoki aims to support a broad variety of robots and tasks, and runs on CPU, GPU, and TPU. Abstract -- Robot motion can have many goals. Depending on the task, we might optimize for pose error, speed, collision, or similarity to a human demonstration. Unlike existing tools, it is also cross-platform: optimization runs natively on CPU, GPU, and TPU. In this paper, we present (i) the design and implementation of PyRoki, (ii) motion retargeting and planning case studies that highlight the advantages of PyRoki's modularity, and (iii) optimization benchmarking, where PyRoki can be 1.4-1.7x I NTRODUCTION Numerical optimization is the standard solution for many tasks in robot kinematics. Using objectives like pose error [8], smoothness [9], and similarity to a human demonstration [6, 10] the robotics community has built diverse optimization software for tasks such as inverse kinematics (IK) [1, 3, 11-13], trajectory optimization [4, 5, 14-19], and Equal contribution.


A Modal-Space Formulation for Momentum Observer Contact Estimation and Effects of Uncertainty for Continuum Robots

arXiv.org Artificial Intelligence

Contact detection for continuum and soft robots has been limited in past works to statics or kinematics-based methods with assumed circular bending curvature or known bending profiles. In this paper, we adapt the generalized momentum observer contact estimation method to continuum robots. This is made possible by leveraging recent results for real-time shape sensing of continuum robots along with a modal-space representation of the robot dynamics. In addition to presenting an approach for estimating the generalized forces due to contact via a momentum observer, we present a constrained optimization method to identify the wrench imparted on the robot during contact. We also present an approach for investigating the effects of unmodeled deviations in the robot's dynamic state on the contact detection method and we validate our algorithm by simulations and experiments. We also compare the performance of the momentum observer to the joint force deviation method, a direct estimation approach using the robot's full dynamic model. We also demonstrate a basic extension of the method to multisegment continuum robots. Results presented in this work extend dynamic contact detection to the domain of continuum and soft robots and can be used to improve the safety of large-scale continuum robots for human-robot collaboration.