Goto

Collaborating Authors

 Optimization


Riemannian Multiplicative Update for Sparse Simplex constraint using oblique rotation manifold

arXiv.org Artificial Intelligence

We propose a new manifold optimization method to solve low-rank problems with sparse simplex constraints (variables are simultaneous nonnegativity, sparsity, and sum-to-1) that are beneficial in applications. The proposed approach exploits oblique rotation manifolds, rewrite the problem, and introduce a new Riemannian optimization method. Experiments on synthetic datasets compared to the standard Euclidean method show the effectiveness of the proposed method.


An extrapolated and provably convergent algorithm for nonlinear matrix decomposition with the ReLU function

arXiv.org Machine Learning

Nonlinear matrix decomposition (NMD) with the ReLU function, denoted ReLU-NMD, is the following problem: given a sparse, nonnegative matrix $X$ and a factorization rank $r$, identify a rank-$r$ matrix $\Theta$ such that $X\approx \max(0,\Theta)$. This decomposition finds application in data compression, matrix completion with entries missing not at random, and manifold learning. The standard ReLU-NMD model minimizes the least squares error, that is, $\|X - \max(0,\Theta)\|_F^2$. The corresponding optimization problem is nondifferentiable and highly nonconvex. This motivated Saul to propose an alternative model, Latent-ReLU-NMD, where a latent variable $Z$ is introduced and satisfies $\max(0,Z)=X$ while minimizing $\|Z - \Theta\|_F^2$ (``A nonlinear matrix decomposition for mining the zeros of sparse data'', SIAM J. Math. Data Sci., 2022). Our first contribution is to show that the two formulations may yield different low-rank solutions $\Theta$; in particular, we show that Latent-ReLU-NMD can be ill-posed when ReLU-NMD is not, meaning that there are instances in which the infimum of Latent-ReLU-NMD is not attained while that of ReLU-NMD is. We also consider another alternative model, called 3B-ReLU-NMD, which parameterizes $\Theta=WH$, where $W$ has $r$ columns and $H$ has $r$ rows, allowing one to get rid of the rank constraint in Latent-ReLU-NMD. Our second contribution is to prove the convergence of a block coordinate descent (BCD) applied to 3B-ReLU-NMD and referred to as BCD-NMD. Our third contribution is a novel extrapolated variant of BCD-NMD, dubbed eBCD-NMD, which we prove is also convergent under mild assumptions. We illustrate the significant acceleration effect of eBCD-NMD compared to BCD-NMD, and also show that eBCD-NMD performs well against the state of the art on synthetic and real-world data sets.


Solving the Best Subset Selection Problem via Suboptimal Algorithms

arXiv.org Machine Learning

Best subset selection in linear regression is well known to be nonconvex and computationally challenging to solve, as the number of possible subsets grows rapidly with increasing dimensionality of the problem. As a result, finding the global optimal solution via an exact optimization method for a problem with dimensions of 1000s may take an impractical amount of CPU time. This suggests the importance of finding suboptimal procedures that can provide good approximate solutions using much less computational effort than exact methods. In this work, we introduce a new procedure and compare it with other popular suboptimal algorithms to solve the best subset selection problem. Extensive computational experiments using synthetic and real data have been performed. The results provide insights into the performance of these methods in different data settings. The new procedure is observed to be a competitive suboptimal algorithm for solving the best subset selection problem for high-dimensional data.


Tree-Guided $L_1$-Convex Clustering

arXiv.org Artificial Intelligence

Convex clustering is a modern clustering framework that guarantees globally optimal solutions and performs comparably to other advanced clustering methods. However, obtaining a complete dendrogram (clusterpath) for large-scale datasets remains computationally challenging due to the extensive costs associated with iterative optimization approaches. To address this limitation, we develop a novel convex clustering algorithm called Tree-Guided $L_1$-Convex Clustering (TGCC). We first focus on the fact that the loss function of $L_1$-convex clustering with tree-structured weights can be efficiently optimized using a dynamic programming approach. We then develop an efficient cluster fusion algorithm that utilizes the tree structure of the weights to accelerate the optimization process and eliminate the issue of cluster splits commonly observed in convex clustering. By combining the dynamic programming approach with the cluster fusion algorithm, the TGCC algorithm achieves superior computational efficiency without sacrificing clustering performance. Remarkably, our TGCC algorithm can construct a complete clusterpath for $10^6$ points in $\mathbb{R}^2$ within 15 seconds on a standard laptop without the need for parallel or distributed computing frameworks. Moreover, we extend the TGCC algorithm to develop biclustering and sparse convex clustering algorithms.


Remarks on the Polyak-Lojasiewicz inequality and the convergence of gradient systems

arXiv.org Artificial Intelligence

-- This work explores generalizations of the Polyak- ลojasiewicz inequality (PลI) and their implications for the convergence behavior of gradient flows in optimization problems. Motivated by the continuous-time linear quadratic regulator (CT -LQR) policy optimization problem - where only a weaker version of the PลI is characterized in the literature - this work shows that while weaker conditions are sufficient for global convergence to, and optimality of the set of critical points of the cost function, the "profile" of the gradient flow solution can change significantly depending on which "flavor" of inequality the cost satisfies. After a general theoretical analysis, we focus on fitting the CT -LQR policy optimization problem to the proposed framework, showing that, in fact, it can never satisfy a PลI in its strongest form. Recent advances in Artificial Intelligence (AI) and Machine Learning (ML) have rekindled interest in optimization theory, with many traditional results being revisited in light of the proposed techniques [1]-[8]. In particular, the typical model-free formulation of many successful learning techniques motivates the study of gradient-based optimization methods, which are invaluable in understanding the training of neural networks and similar architectures, typically done through back-propagation algorithms. Gradient descent or, in continuous time, gradient flow, consists in searching for the argument x that minimizes the value of a given function f (x) by "moving along" the direction of steepest descent of the cost function. Theoretical guarantees are typically desirable, and in search of balancing generality and good properties, often in the optimization literature one deals with specific classes of optimization problems, such as convex optimization [9] or linear programming [10]. In this paper, we will focus on optimization problems that satisfy (to different degrees) a Polyak-ลojasiewisc inequality (PลI), also known as the gradient dominance condition [11], [12]. Furthermore, satisfying a PลI globally (g PลI) also guarantees strong robustness properties [2]. However, characterizing such a condition might not be possible for every optimization problem. In [1] the authors noticed that more general conditions than the gPลI can be proposed by using different classes of comparison functions so that different robustness results can be guaranteed. For the discrete-time version of the problem, in [14]- [16] the authors show that it satisfies a gPลI, guaranteeing exponential convergence to the optimal feedback law for initialization in the stabilizing set of feedback matrices.


Partial Transportability for Domain Generalization

arXiv.org Machine Learning

A fundamental task in AI is providing performance guarantees for predictions made in unseen domains. In practice, there can be substantial uncertainty about the distribution of new data, and corresponding variability in the performance of existing predictors. Building on the theory of partial identification and transportability, this paper introduces new results for bounding the value of a functional of the target distribution, such as the generalization error of a classifier, given data from source domains and assumptions about the data generating mechanisms, encoded in causal diagrams. Our contribution is to provide the first general estimation technique for transportability problems, adapting existing parameterization schemes such Neural Causal Models to encode the structural constraints necessary for cross-population inference. We demonstrate the expressiveness and consistency of this procedure and further propose a gradient-based optimization scheme for making scalable inferences in practice. Our results are corroborated with experiments.


Multi-Objective Optimization and Hyperparameter Tuning With Desirability Functions

arXiv.org Artificial Intelligence

The goal of this article is to provide an introduction to the desirability function approach to multi-objective optimization (direct and surrogate model-based), and multi-objective hyperparameter tuning. This work is based on the paper by Kuhn (2016). It presents a `Python` implementation of Kuhn's `R` package `desirability`. The `Python` package `spotdesirability` is available as part of the `sequential parameter optimization` framework. After a brief introduction to the desirability function approach is presented, three examples are given that demonstrate how to use the desirability functions for classical optimization, surrogate-model based optimization, and hyperparameter tuning.


PartialLoading: User Scheduling and Bandwidth Allocation for Parameter-sharing Edge Inference

arXiv.org Artificial Intelligence

By provisioning inference offloading services, edge inference drives the rapid growth of AI applications at the network edge. However, achieving high task throughput with stringent latency requirements remains a significant challenge. To address this issue, we develop a parameter-sharing AI model loading (PartialLoading) framework for multi-user edge inference, which exploits two key insights: 1) the majority of latency arises from loading AI models into server GPU memory, and 2) different AI models can share a significant number of parameters, for which redundant loading should be avoided. Towards this end, we formulate a joint multi-user scheduling and spectrum bandwidth allocation problem to maximize task throughput by exploiting shared parameter blocks across models. The intuition is to judiciously schedule user requests to reuse the shared parameter blocks between consecutively loaded models, thereby reducing model loading time substantially. To facilitate solution finding, we decouple the problem into two sub-problems, i.e., user scheduling and bandwidth allocation, showing that solving them sequentially is equivalent to solving the original problem. Due to the NP-hardness of the problem, we first study an important special case called the "bottom-layer-sharing" case, where AI models share some bottom layers within clusters, and design a dynamic programming-based algorithm to obtain the optimal solution in polynomial time. For the general case, where shared parameter blocks appear at arbitrary positions within AI models, we propose a greedy heuristic to obtain the sub-optimal solution efficiently. Simulation results demonstrate that the proposed framework significantly improves task throughput under deadline constraints compared with user scheduling without exploiting parameter sharing.


Mismatch-Robust Underwater Acoustic Localization Using A Differentiable Modular Forward Model

arXiv.org Artificial Intelligence

--In this paper, we study the underwater acoustic localization in the presence of environmental mismatch. Especially, we exploit a pre-trained neural network for the acoustic wave propagation in a gradient-based optimization framework to estimate the source location. T o alleviate the effect of mismatch between the training data and the test data, we simultaneously optimize over the network weights at the inference time, and provide conditions under which this method is effective. Moreover, we introduce a physics-inspired modularity in the forward model that enables us to learn the path lengths of the multipath structure in an end-to-end training manner without access to the specific path labels. We investigate the validity of the assumptions in a simple yet illustrative environment model.


Energy-Aware Lane Planning for Connected Electric Vehicles in Urban Traffic: Design and Vehicle-in-the-Loop Validation

arXiv.org Artificial Intelligence

-- Urban driving with connected and automated vehicles (CA Vs) offers potential for energy savings, yet most eco-driving strategies focus solely on longitudinal speed control within a single lane. T o address this gap, we propose a novel energy-aware motion planning framework that jointly optimizes longitudinal speed and lateral lane-change decisions using vehicle-to-infrastructure (V2I) communication. Our approach estimates long-term energy costs using a graph-based approximation and solves short-horizon optimal control problems under traffic constraints. Using a data-driven energy model calibrated to an actual battery electric vehicle, we demonstrate with vehicle-in-the-loop experiments that our method reduces motion energy consumption by up to 24% compared to a human driver, highlighting the potential of connectivity-enabled planning for sustainable urban autonomy. Connected and Automated V ehicles (CA Vs) provide benefits in road safety, traffic efficiency, and energy efficiency [1]. Using vehicle-to-infrastructure (V2I) and vehicle-to-vehicle (V2V) communications, CA Vs can coordinate with traffic signals and neighboring vehicles to optimize their motion in ways human drivers are incapable of [2]. Prior studies have shown that by optimizing longitudinal behavior using Signal Phase and Timing (SPaT) data from connected traffic lights, a single CA V can adjust its cruising speed to avoid unnecessary stops, yielding substantial energy savings (11.35 % to 16.4%) [3], [4].