Goto

Collaborating Authors

Optimization


Code Adam Gradient Descent Optimization From Scratch

#artificialintelligence

Gradient descent is an optimization algorithm that follows the negative gradient of an objective function in order to locate the minimum of the function. A limitation of gradient descent is that a single step size (learning rate) is used for all input variables. Extensions to gradient descent like AdaGrad and RMSProp update the algorithm to use a separate step size for each input variable but may result in a step size that rapidly decreases to very small values. The Adaptive Movement Estimation algorithm, or Adam for short, is an extension to gradient descent and a natural successor to techniques like AdaGrad and RMSProp that automatically adapts a learning rate for each input variable for the objective function and further smooths the search process by using an exponentially decreasing moving average of the gradient to make updates to variables. In this tutorial, you will discover how to develop gradient descent with Adam optimization algorithm from scratch.


Optimization: A notorious road to Structured Inefficiency and transition to Combinatorial…

#artificialintelligence

Title of the article is very oxymoronic: having an optimization and inefficiencies in the same context. But it is very true looking at the trend and current practices in the logistics industries. In this article, we are going to discuss how current practice of optimization is contributing significant inefficiencies for the organizations. And, how big firms (Like Amazon, Shopify, Uber) are taking an advantage of advancements in the field of combinatorial/mathematical optimization to identify the new opportunities, and winning the competition over thin margin, by creating a little wiggle room for the profit. Few months ago, I was discussing with my friend about an idea of mathematical formulation for a kitchen which can be efficient enough to make 1500 entirely different recipes (not just a vegetables, sauces and cheese on bread or bun), with less than 100 ingredients in inventory, with the use of minimal kitchen appliances.


Rank-One Measurements of Low-Rank PSD Matrices Have Small Feasible Sets

arXiv.org Machine Learning

We study the role of the constraint set in determining the solution to low-rank, positive semidefinite (PSD) matrix sensing problems. The setting we consider involves rank-one sensing matrices: In particular, given a set of rank-one projections of an approximately low-rank PSD matrix, we characterize the radius of the set of PSD matrices that satisfy the measurements. This result yields a sampling rate to guarantee singleton solution sets when the true matrix is exactly low-rank, such that the choice of the objective function or the algorithm to be used is inconsequential in its recovery. We discuss applications of this contribution and compare it to recent literature regarding implicit regularization for similar problems. We demonstrate practical implications of this result by applying conic projection methods for PSD matrix recovery without incorporating low-rank regularization.


Are we Forgetting about Compositional Optimisers in Bayesian Optimisation?

arXiv.org Machine Learning

Bayesian optimisation presents a sample-efficient methodology for global optimisation. Within this framework, a crucial performance-determining subroutine is the maximisation of the acquisition function, a task complicated by the fact that acquisition functions tend to be non-convex and thus nontrivial to optimise. In this paper, we undertake a comprehensive empirical study of approaches to maximise the acquisition function. Additionally, by deriving novel, yet mathematically equivalent, compositional forms for popular acquisition functions, we recast the maximisation task as a compositional optimisation problem, allowing us to benefit from the extensive literature in this field. We highlight the empirical advantages of the compositional approach to acquisition function maximisation across 3958 individual experiments comprising synthetic optimisation tasks as well as tasks from Bayesmark. Given the generality of the acquisition function maximisation subroutine, we posit that the adoption of compositional optimisers has the potential to yield performance improvements across all domains in which Bayesian optimisation is currently being applied.


On the Convergence of Continuous Constrained Optimization for Structure Learning

arXiv.org Machine Learning

Structure learning of directed acyclic graphs (DAGs) is a fundamental problem in many scientific endeavors. A new line of work, based on NOTEARS (Zheng et al., 2018), reformulates the structure learning problem as a continuous optimization one by leveraging an algebraic characterization of DAG constraint. The constrained problem is typically solved using the augmented Lagrangian method (ALM) which is often preferred to the quadratic penalty method (QPM) by virtue of its convergence result that does not require the penalty coefficient to go to infinity, hence avoiding ill-conditioning. In this work, we review the standard convergence result of the ALM and show that the required conditions are not satisfied in the recent continuous constrained formulation for learning DAGs. We demonstrate empirically that its behavior is akin to that of the QPM which is prone to ill-conditioning, thus motivating the use of second-order method in this setting. We also establish the convergence guarantee of QPM to a DAG solution, under mild conditions, based on a property of the DAG constraint term.


Quality-Diversity Optimization: a novel branch of stochastic optimization

arXiv.org Machine Learning

Traditional optimization algorithms search for a single global optimum that maximizes (or minimizes) the objective function. Multimodal optimization algorithms search for the highest peaks in the search space that can be more than one. Quality-Diversity algorithms are a recent addition to the evolutionary computation toolbox that do not only search for a single set of local optima, but instead try to illuminate the search space. In effect, they provide a holistic view of how high-performing solutions are distributed throughout a search space. The main differences with multimodal optimization algorithms are that (1) Quality-Diversity typically works in the behavioral space (or feature space), and not in the genotypic (or parameter) space, and (2) Quality-Diversity attempts to fill the whole behavior space, even if the niche is not a peak in the fitness landscape. In this chapter, we provide a gentle introduction to Quality-Diversity optimization, discuss the main representative algorithms, and the main current topics under consideration in the community. Throughout the chapter, we also discuss several successful applications of Quality-Diversity algorithms, including deep learning, robotics, and reinforcement learning.


Variational Quantum Algorithms

arXiv.org Machine Learning

Applications such as simulating large quantum systems or solving large-scale linear algebra problems are immensely challenging for classical computers due their extremely high computational cost. Quantum computers promise to unlock these applications, although fault-tolerant quantum computers will likely not be available for several years. Currently available quantum devices have serious constraints, including limited qubit numbers and noise processes that limit circuit depth. Variational Quantum Algorithms (VQAs), which employ a classical optimizer to train a parametrized quantum circuit, have emerged as a leading strategy to address these constraints. VQAs have now been proposed for essentially all applications that researchers have envisioned for quantum computers, and they appear to the best hope for obtaining quantum advantage. Nevertheless, challenges remain including the trainability, accuracy, and efficiency of VQAs. In this review article we present an overview of the field of VQAs. Furthermore, we discuss strategies to overcome their challenges as well as the exciting prospects for using them as a means to obtain quantum advantage.


Clustering Ensemble Meets Low-rank Tensor Approximation

arXiv.org Machine Learning

This paper explores the problem of clustering ensemble, which aims to combine multiple base clusterings to produce better performance than that of the individual one. The existing clustering ensemble methods generally construct a co-association matrix, which indicates the pairwise similarity between samples, as the weighted linear combination of the connective matrices from different base clusterings, and the resulting co-association matrix is then adopted as the input of an off-the-shelf clustering algorithm, e.g., spectral clustering. However, the co-association matrix may be dominated by poor base clusterings, resulting in inferior performance. In this paper, we propose a novel low-rank tensor approximation-based method to solve the problem from a global perspective. Specifically, by inspecting whether two samples are clustered to an identical cluster under different base clusterings, we derive a coherent-link matrix, which contains limited but highly reliable relationships between samples. We then stack the coherent-link matrix and the co-association matrix to form a three-dimensional tensor, the low-rankness property of which is further explored to propagate the information of the coherent-link matrix to the co-association matrix, producing a refined co-association matrix. We formulate the proposed method as a convex constrained optimization problem and solve it efficiently. Experimental results over 7 benchmark data sets show that the proposed model achieves a breakthrough in clustering performance, compared with 12 state-of-the-art methods. To the best of our knowledge, this is the first work to explore the potential of low-rank tensor on clustering ensemble, which is fundamentally different from previous approaches.


Squirrel: A Switching Hyperparameter Optimizer

arXiv.org Machine Learning

In this short note, we describe our submission to the NeurIPS 2020 BBO challenge. Motivated by the fact that different optimizers work well on different problems, our approach switches between different optimizers. Since the team names on the competition's leaderboard were randomly generated "alliteration nicknames", consisting of an adjective and an animal with the same initial letter, we called our approach the Switching Squirrel, or here, short, Squirrel. The challenge mandated to suggest 16 successive batches of 8 hyperparameter configurations at a time. We chose to only use one optimizer for a given batch, warmstarted with all previous observations.


Solving the Travelling Thief Problem based on Item Selection Weight and Reverse Order Allocation

arXiv.org Artificial Intelligence

The Travelling Thief Problem (TTP) is a challenging combinatorial optimization problem that attracts many scholars. The TTP interconnects two well-known NP-hard problems: the Travelling Salesman Problem (TSP) and the 0-1 Knapsack Problem (KP). Increasingly algorithms have been proposed for solving this novel problem that combines two interdependent sub-problems. In this paper, TTP is investigated theoretically and empirically. An algorithm based on the score value calculated by our proposed formulation in picking items and sorting items in the reverse order in the light of the scoring value is proposed to solve the problem. Different approaches for solving the TTP are compared and analyzed; the experimental investigations suggest that our proposed approach is very efficient in meeting or beating current state-of-the-art heuristic solutions on a comprehensive set of benchmark TTP instances.