Goto

Collaborating Authors

 Optimization


A proximal-proximal majorization-minimization algorithm for nonconvex tuning-free robust regression problems

arXiv.org Machine Learning

In this paper, we introduce a proximal-proximal majorization-minimization (PPMM) algorithm for nonconvex tuning-free robust regression problems. The basic idea is to apply the proximal majorization-minimization algorithm to solve the nonconvex problem with the inner subproblems solved by a sparse semismooth Newton (SSN) method based proximal point algorithm (PPA). We must emphasize that the main difficulty in the design of the algorithm lies in how to overcome the singular difficulty of the inner subproblem. Furthermore, we also prove that the PPMM algorithm converges to a d-stationary point. Due to the Kurdyka-Lojasiewicz (KL) property of the problem, we present the convergence rate of the PPMM algorithm. Numerical experiments demonstrate that our proposed algorithm outperforms the existing state-of-the-art algorithms.


Quantum Computing for Artificial Intelligence Based Mobile Network Optimization

arXiv.org Artificial Intelligence

In this paper, we discuss how certain radio access network optimization problems can be modelled using the concept of constraint satisfaction problems in artificial intelligence, and solved at scale using a quantum computer. As a case study, we discuss root sequence index (RSI) assignment problem - an important LTE/NR physical random access channel configuration related automation use-case. We formulate RSI assignment as quadratic unconstrained binary optimization (QUBO) problem constructed using data ingested from a commercial mobile network, and solve it using a cloud-based commercially available quantum computing platform. Results show that quantum annealing solver can successfully assign conflict-free RSIs. Comparison with well-known heuristics reveals that some classic algorithms are even more effective in terms of solution quality and computation time. The non-quantum advantage is due to the fact that current implementation is a semi-quantum proof-of-concept algorithm. Also, the results depend on the type of quantum computer used. Nevertheless, the proposed framework is highly flexible and holds tremendous potential for harnessing the power of quantum computing in mobile network automation.


A Stochastic Sequential Quadratic Optimization Algorithm for Nonlinear Equality Constrained Optimization with Rank-Deficient Jacobians

arXiv.org Machine Learning

We propose an algorithm for solving equality constrained optimization problems in which the objective function is defined by an expectation of a stochastic function. Formulations of this type arise throughout science and engineering in important applications such as data-fitting problems, where one aims to determine a model that minimizes the discrepancy between values yielded by the model and corresponding known outputs. Our algorithm is designed for solving such problems when the decision variables are restricted to the solution set of a (potentially nonlinear) set of equations. We are particularly interested in such problems when the constraint Jacobian--i.e., the matrix of first-order derivatives of the constraint function--may be rank deficient in some or even all iterations during the run of an algorithm, since this can be an unavoidable occurrence in practice that would ruin the convergence properties of any algorithm that is not specifically designed for this setting. The structure of our algorithm follows a step decomposition strategy that is common in the constrained optimization literature; in particular, our algorithm has roots in the Byrd-Omojokun approach [17].


Label Disentanglement in Partition-based Extreme Multilabel Classification

arXiv.org Machine Learning

Partition-based methods are increasingly-used in extreme multi-label classification (XMC) problems due to their scalability to large output spaces (e.g., millions or more). However, existing methods partition the large label space into mutually exclusive clusters, which is sub-optimal when labels have multi-modality and rich semantics. For instance, the label "Apple" can be the fruit or the brand name, which leads to the following research question: can we disentangle these multi-modal labels with non-exclusive clustering tailored for downstream XMC tasks? In this paper, we show that the label assignment problem in partition-based XMC can be formulated as an optimization problem, with the objective of maximizing precision rates. This leads to an efficient algorithm to form flexible and overlapped label clusters, and a method that can alternatively optimizes the cluster assignments and the model parameters for partition-based XMC. Experimental results on synthetic and real datasets show that our method can successfully disentangle multi-modal labels, leading to state-of-the-art (SOTA) results on four XMC benchmarks.


Multi-objective Asynchronous Successive Halving

arXiv.org Machine Learning

Hyperparameter optimization (HPO) is increasingly used to automatically tune the predictive performance (e.g., accuracy) of machine learning models. However, in a plethora of real-world applications, accuracy is only one of the multiple -- often conflicting -- performance criteria, necessitating the adoption of a multi-objective (MO) perspective. While the literature on MO optimization is rich, few prior studies have focused on HPO. In this paper, we propose algorithms that extend asynchronous successive halving (ASHA) to the MO setting. Considering multiple evaluation metrics, we assess the performance of these methods on three real world tasks: (i) Neural architecture search, (ii) algorithmic fairness and (iii) language model optimization. Our empirical analysis shows that MO ASHA enables to perform MO HPO at scale. Further, we observe that that taking the entire Pareto front into account for candidate selection consistently outperforms multi-fidelity HPO based on MO scalarization in terms of wall-clock time. Our algorithms (to be open-sourced) establish new baselines for future research in the area.


Faster Randomized Methods for Orthogonality Constrained Problems

arXiv.org Artificial Intelligence

Recent literature has advocated the use of randomized methods for accelerating the solution of various matrix problems arising throughout data science and computational science. One popular strategy for leveraging randomization is to use it as a way to reduce problem size. However, methods based on this strategy lack sufficient accuracy for some applications. Randomized preconditioning is another approach for leveraging randomization, which provides higher accuracy. The main challenge in using randomized preconditioning is the need for an underlying iterative method, thus randomized preconditioning so far have been applied almost exclusively to solving regression problems and linear systems. In this article, we show how to expand the application of randomized preconditioning to another important set of problems prevalent across data science: optimization problems with (generalized) orthogonality constraints. We demonstrate our approach, which is based on the framework of Riemannian optimization and Riemannian preconditioning, on the problem of computing the dominant canonical correlations and on the Fisher linear discriminant analysis problem. For both problems, we evaluate the effect of preconditioning on the computational costs and asymptotic convergence, and demonstrate empirically the utility of our approach.


Local policy search with Bayesian optimization

arXiv.org Machine Learning

Reinforcement learning (RL) aims to find an optimal policy by interaction with an environment. Consequently, learning complex behavior requires a vast number of samples, which can be prohibitive in practice. Nevertheless, instead of systematically reasoning and actively choosing informative samples, policy gradients for local search are often obtained from random perturbations. These random samples yield high variance estimates and hence are sub-optimal in terms of sample complexity. Actively selecting informative samples is at the core of Bayesian optimization, which constructs a probabilistic surrogate of the objective from past samples to reason about informative subsequent ones. In this paper, we propose to join both worlds. We develop an algorithm utilizing a probabilistic model of the objective function and its gradient. Based on the model, the algorithm decides where to query a noisy zeroth-order oracle to improve the gradient estimates. The resulting algorithm is a novel type of policy search method, which we compare to existing black-box algorithms. The comparison reveals improved sample complexity and reduced variance in extensive empirical evaluations on synthetic objectives. Further, we highlight the benefits of active sampling on popular RL benchmarks.


3D Shape Registration Using Spectral Graph Embedding and Probabilistic Matching

arXiv.org Machine Learning

We address the problem of 3D shape registration and we propose a novel technique based on spectral graph theory and probabilistic matching. The task of 3D shape analysis involves tracking, recognition, registration, etc. Analyzing 3D data in a single framework is still a challenging task considering the large variability of the data gathered with different acquisition devices. 3D shape registration is one such challenging shape analysis task. The main contribution of this chapter is to extend the spectral graph matching methods to very large graphs by combining spectral graph matching with Laplacian embedding. Since the embedded representation of a graph is obtained by dimensionality reduction we claim that the existing spectral-based methods are not easily applicable. We discuss solutions for the exact and inexact graph isomorphism problems and recall the main spectral properties of the combinatorial graph Laplacian; We provide a novel analysis of the commute-time embedding that allows us to interpret the latter in terms of the PCA of a graph, and to select the appropriate dimension of the associated embedded metric space; We derive a unit hyper-sphere normalization for the commute-time embedding that allows us to register two shapes with different samplings; We propose a novel method to find the eigenvalue-eigenvector ordering and the eigenvector signs using the eigensignature (histogram) which is invariant to the isometric shape deformations and fits well in the spectral graph matching framework, and we present a probabilistic shape matching formulation using an expectation maximization point registration algorithm which alternates between aligning the eigenbases and finding a vertex-to-vertex assignment.


Robust M-estimation-based Tensor Ring Completion: a Half-quadratic Minimization Approach

arXiv.org Machine Learning

Tensor completion is the problem of estimating the missing values of high-order data from partially observed entries. Among several definitions of tensor rank, tensor ring rank affords the flexibility and accuracy needed to model tensors of different orders, which motivated recent efforts on tensor-ring completion. However, data corruption due to prevailing outliers poses major challenges to existing algorithms. In this paper, we develop a robust approach to tensor ring completion that uses an M-estimator as its error statistic, which can significantly alleviate the effect of outliers. Leveraging a half-quadratic (HQ) method, we reformulate the problem as one of weighted tensor completion. We present two HQ-based algorithms based on truncated singular value decomposition and matrix factorization along with their convergence and complexity analysis. Extendibility of the proposed approach to alternative definitions of tensor rank is also discussed. The experimental results demonstrate the superior performance of the proposed approach over state-of-the-art robust algorithms for tensor completion.


QUBO transformation using Eigenvalue Decomposition

arXiv.org Artificial Intelligence

Quadratic Unconstrained Binary Optimization (QUBO) is a general-purpose modeling framework for combinatorial optimization problems and is a requirement for quantum annealers. This paper utilizes the eigenvalue decomposition of the underlying Q matrix to alter and improve the search process by extracting the information from dominant eigenvalues and eigenvectors to implicitly guide the search towards promising areas of the solution landscape. Computational results on benchmark datasets illustrate the efficacy of our routine demonstrating significant performance improvements on problems with dominant eigenvalues.