Goto

Collaborating Authors

 Optimization


On Theory of Model-Agnostic Meta-Learning Algorithms

#artificialintelligence

Based on a joint work with Aryan Mokhtari, UT Austin, and Asu Ozdaglar, MIT. Imagine sitting in your autonomous car, going for a vacation. Your vehicle should follow the directions provided by the navigation app, and also use multiple sensors to monitor other vehicles, road signs, street light, etc. As a result, during the course of your journey, your car might need to take actions within a few seconds, such as turning or stopping. The question is how should your vehicle be programmed to be able to adapt to the new tasks within a short amount of time and limited data.


PODDP: Partially Observable Differential Dynamic Programming for Latent Belief Space Planning

arXiv.org Artificial Intelligence

Autonomous agents are limited in their ability to observe the world state. Partially observable Markov decision processes (POMDPs) formally model the problem of planning under world state uncertainty, but POMDPs with continuous actions and nonlinear dynamics suitable for robotics applications are challenging to solve. In this paper, we present an efficient differential dynamic programming (DDP) algorithm for belief space planning in POMDPs with uncertainty over a discrete latent state, and continuous states, actions, observations, and nonlinear dynamics. This representation allows planning of dynamic trajectories which are sensitive to structured uncertainty over discrete latent world states. We develop dynamic programming techniques to optimize a contingency plan over a tree of possible observations and belief space trajectories, and also derive a hierarchical version of the algorithm. Our method is applicable to problems with uncertainty over the cost or reward function (e.g., the configuration of goals or obstacles), uncertainty over the dynamics (e.g., the dynamical mode of a hybrid system), and uncertainty about interactions, where other agents' behavior is conditioned on latent intentions. Benchmarks show that our algorithm outperforms popular heuristic approaches to planning under uncertainty, and results from an autonomous lane changing task demonstrate that our algorithm can synthesize robust interactive trajectories.


Active emulation of computer codes with Gaussian processes -- Application to remote sensing

arXiv.org Machine Learning

Signal Processing, Universidad Rey Juan Carlos (URJC), Camino del Molino 5, 28943 Fuenlabrada, Spain Abstract Many fields of science and engineering rely on running simulations with complex and computationally expensive models to understand the involved processes in the system of interest. Nevertheless, the high cost involved hamper reliable and exhaustive simulations. V ery often such codes incorporate heuristics that ironically make them less tractable and transparent. This paper introduces an active learning methodology for adaptively constructing surrogate models, i.e. emulators, of such costly computer codes in a multi-output setting. The proposed technique is sequential and adaptive, and is based on the optimization of a suitable acquisition function. It aims to achieve accurate approximations, model tractability, as well as compact and expressive simulated datasets. In order to achieve this, the proposed Active Multi-Output Gaussian Process Emulator (AMOGAPE) combines the predictive capacity of Gaussian Processes (GPs) with the design of an acquisition function that favors sampling in low density and fluctuating regions of the approximation functions. Comparing different acquisition functions, we illustrate the promising performance of the method for the construction of emulators with toy examples, as well as for a widely used remote sensing transfer code. Keywords: Active learning, Gaussian process, emulation, design of experiments, computer code, remote sensing, radiative transfer model 1 Introduction In many areas of science and engineering, systems are analyzed by running computer code simulations which act as convenient approximations of reality. They allow us to simulate many different systems of interest and characterize the involved processes, such as turbulence or energy transfer, and their interactions and relevance. Depending on the body of literature, they are known as physics-based or mechanistic models, or simply simulators [30, 39]. Two important limitation are associated with simulators. The first, and perhaps the most important problem of these computer codes, is their often high computational cost, which hampers reliable and exhaustive simulations.


Noise-Assisted Variational Hybrid Quantum-Classical Optimization

arXiv.org Machine Learning

Variational hybrid quantum-classical optimization represents one the most promising avenue to show the advantage of nowadays noisy intermediate-scale quantum computers in solving hard problems, such as finding the minimum-energy state of a Hamiltonian or solving some machine-learning tasks. In these devices noise is unavoidable and impossible to error-correct, yet its role in the optimization process is not much understood, especially from the theoretical viewpoint. Here we consider a minimization problem with respect to a variational state, iteratively obtained via a parametric quantum circuit, taking into account both the role of noise and the stochastic nature of quantum measurement outcomes. We show that the accuracy of the result obtained for a fixed number of iterations is bounded by a quantity related to the Quantum Fisher Information of the variational state. Using this bound, we find the unexpected result that, in some regimes, noise can be beneficial, allowing a faster solution to the optimization problem.


A Distributed Quasi-Newton Algorithm for Primal and Dual Regularized Empirical Risk Minimization

arXiv.org Machine Learning

We propose a communication- and computation-efficient distributed optimization algorithm using second-order information for solving empirical risk minimization (ERM) problems with a nonsmooth regularization term. Our algorithm is applicable to both the primal and the dual ERM problem. Current second-order and quasi-Newton methods for this problem either do not work well in the distributed setting or work only for specific regularizers. Our algorithm uses successive quadratic approximations of the smooth part, and we describe how to maintain an approximation of the (generalized) Hessian and solve subproblems efficiently in a distributed manner. When applied to the distributed dual ERM problem, unlike state of the art that takes only the block-diagonal part of the Hessian, our approach is able to utilize global curvature information and is thus magnitudes faster. The proposed method enjoys global linear convergence for a broad range of non-strongly convex problems that includes the most commonly used ERMs, thus requiring lower communication complexity. It also converges on non-convex problems, so has the potential to be used on applications such as deep learning. Computational results demonstrate that our method significantly improves on communication cost and running time over the current state-of-the-art methods.


Learning Improvement Heuristics for Solving the Travelling Salesman Problem

arXiv.org Artificial Intelligence

Recent studies in using deep learning to solve the Travelling Salesman Problem (TSP) focus on construction heuristics, the solution of which may still be far from optimal-ity. To improve solution quality, additional procedures such as sampling or beam search are required. However, they are still based on the same construction policy, which is less effective in refining a solution. In this paper, we propose to directly learn the improvement heuristics for solving TSP based on deep reinforcement learning. We first present a reinforcement learning formulation for the improvement heuristic, where the policy guides selection of the next solution. Then, we propose a deep architecture as the policy network based on self-attention. Extensive experiments show that, improvement policies learned by our approach yield better results than state-of-the-art methods, even from random initial solutions. Moreover, the learned policies are more effective than the traditional handcrafted ones, and robust to different initial solutions with either high or poor quality. 1 Introduction The Travelling Salesman Problem (TSP) is a typical combinatorial optimization problem that has extensive applications in the real world. The problem statement is straightforward: given a set of locations, find the salesman a shortest tour that traverses each location exactly once and returns to the original one. Although having been widely studied for decades, achieving satisfactory performance is still challenging due to its NPhard complexity.


Tensor Completion for Weakly-dependent Data on Graph for Metro Passenger Flow Prediction

arXiv.org Machine Learning

Low-rank tensor decomposition and completion have attracted significant interest from academia given the ubiquity of tensor data. However, the low-rank structure is a global property, which will not be fulfilled when the data presents complex and weak dependencies given specific graph structures. One particular application that motivates this study is the spatiotemporal data analysis. As shown in the preliminary study, weakly dependencies can worsen the low-rank tensor completion performance. In this paper, we propose a novel low-rank CANDECOMP / PARAFAC (CP) tensor decomposition and completion framework by introducing the $L_{1}$-norm penalty and Graph Laplacian penalty to model the weakly dependency on graph. We further propose an efficient optimization algorithm based on the Block Coordinate Descent for efficient estimation. A case study based on the metro passenger flow data in Hong Kong is conducted to demonstrate improved performance over the regular tensor completion methods.


Integrative Generalized Convex Clustering Optimization and Feature Selection for Mixed Multi-View Data

arXiv.org Machine Learning

In mixed multi-view data, multiple sets of diverse features are measured on the same set of samples. By integrating all available data sources, we seek to discover common group structure among the samples that may be hidden in individualistic cluster analyses of a single data-view. While several techniques for such integrative clustering have been explored, we propose and develop a convex formalization that will inherit the strong statistical, mathematical and empirical properties of increasingly popular convex clustering methods. Specifically, our Integrative Generalized Convex Clustering Optimization (iGecco) method employs different convex distances, losses, or divergences for each of the different data views with a joint convex fusion penalty that leads to common groups. Additionally, integrating mixed multi-view data is often challenging when each data source is high-dimensional. To perform feature selection in such scenarios, we develop an adaptive shifted group-lasso penalty that selects features by shrinking them towards their loss-specific centers. Our so-called iGecco+ approach selects features from each data-view that are best for determining the groups, often leading to improved integrative clustering. To fit our model, we develop a new type of generalized multi-block ADMM algorithm using sub-problem approximations that more efficiently fits our model for big data sets. Through a series of numerical experiments and real data examples on text mining and genomics, we show that iGecco+ achieves superior empirical performance for high-dimensional mixed multi-view data.


Tensor Completion via Gaussian Process Based Initialization

arXiv.org Machine Learning

In this paper, we consider the tensor completion problem representing the solution in the tensor train (TT) format. It is assumed that tensor is high-dimensional, and tensor values are generated by an unknown smooth function. The assumption allows us to develop an efficient initialization scheme based on Gaussian Process Regression and TT-cross approximation technique. The proposed approach can be used in conjunction with any optimization algorithm that is usually utilized in tensor completion problems. We empirically justify that in this case the reconstruction error improves compared to the tensor completion with random initialization. As an additional benefit, our technique automatically selects rank thanks to using the TT-cross approximation technique.


Graph-based Multi-view Binary Learning for Image Clustering

arXiv.org Machine Learning

Graph-based Multi-view Binary Learning for Image Clustering Guangqi Jiang a, Huibing Wang a, Jinjia Peng a, Dongyan Chen a, Xianping Fu a,b, a College of Information and Science Technology, Dalian Maritime University, Danlian, Liaoning, 116021, China b Pengcheng Laboratory, Shenzhen, Guangdong, 518055, ChinaAbstract Hashing techniques, also known as binary code learning, have recently gained increasing attention in large-scale data analysis and storage. Generally, most existing hash clustering methods are single-view ones, which lack complete structure or complementary information from multiple views. For cluster tasks, abundant prior researches mainly focus on learning discrete hash code while few works take original data structure into consideration. To address these problems, we propose a novel binary code algorithm for clustering, which adopts graph embedding to preserve the original data structure, called (Graph-based Multi-view Binary Learning) GMBL in this paper. GMBL mainly focuses on encoding the information of multiple views into a compact binary code, which explores complementary information from multiple views. In particular, in order to maintain the graph-based structure of the original data, we adopt a Laplacian matrix to preserve the local linear relationship of the data and map it to the Hamming space. Considering different views have distinctive contributions to the final clustering results, GMBL adopts a strategy of automatically assign weights for each view to better guide the clustering. Finally, An alternating iterative optimization method is adopted to optimize discrete binary codes directly instead of relaxing the binary constraint in two steps. Experiments on five public datasets demonstrate the superiority of our proposed method compared with previous Corresponding author: Xianping Fu Preprint submitted to Journal of L A T EX Templates December 12, 2019 arXiv:1912.05159v1 Introduction With the development of computer vision applications, we have witnessed that hash technology has become an indispensable step in the processing of large data [1] [2]. In dealing with data analysis, organization, and storage, etc., there is an imminent need to use the effective hash code to process data clustering from big databases. Besides, most existed digital devices mainly based on binary code, which can effectively save computing time and storage space.