Goto

Collaborating Authors

 Optimization


GOT: An Optimal Transport framework for Graph comparison

arXiv.org Machine Learning

We present a novel framework based on optimal transport for the challenging problem of comparing graphs. Specifically, we exploit the probabilistic distribution of smooth graph signals defined with respect to the graph topology. This allows us to derive an explicit expression of the Wasserstein distance between graph signal distributions in terms of the graph Laplacian matrices. This leads to a structurally meaningful measure for comparing graphs, which is able to take into account the global structure of graphs, while most other measures merely observe local changes independently. Our measure is then used for formulating a new graph alignment problem, whose objective is to estimate the permutation that minimizes the distance between two graphs. We further propose an efficient stochastic algorithm based on Bayesian exploration to accommodate for the non-convexity of the graph alignment problem. We finally demonstrate the performance of our novel framework on different tasks like graph alignment, graph classification and graph signal prediction, and we show that our method leads to significant improvement with respect to the-state-of-art algorithms.


Scenario approach for minmax optimization with emphasis on the nonconvex case: positive results and caveats

arXiv.org Machine Learning

We treat the so-called scenario approach, a popular probabilistic approximation method for robust minmax optimization problems via independent and indentically distributed (i.i.d) sampling from the uncertainty set, from various perspectives. The scenario approach is well-studied in the important case of convex robust optimization problems, and here we examine how the phenomenon of concentration of measures affects the i.i.d sampling aspect of the scenario approach in high dimensions and its relation with the optimal values. Moreover, we perform a detailed study of both the asymptotic behaviour (consistency) and finite time behaviour of the scenario approach in the more general setting of nonconvex minmax optimization problems. In the direction of the asymptotic behaviour of the scenario approach, we present an obstruction to consistency that arises when the decision set is noncompact. In the direction of finite sample guarantees, we establish a general methodology for extracting "probably approximately correct" type estimates for the finite sample behaviour of the scenario approach for a large class of nonconvex problems.


Data Sketching for Faster Training of Machine Learning Models

arXiv.org Artificial Intelligence

Many machine learning problems reduce to the problem of minimizing an expected risk, defined as the sum of a large number of, often convex, component functions. Iterative gradient methods are popular techniques for the above problems. However, they are in general slow to converge, in particular for large data sets. In this work, we develop analysis for selecting a subset (or sketch) of training data points with their corresponding learning rates in order to provide faster convergence to a close neighbordhood of the optimal solution. We show that subsets that minimize the upper-bound on the estimation error of the full gradient, maximize a submodular facility location function. As a result, by greedily maximizing the facility location function we obtain subsets that yield faster convergence to a close neighborhood of the optimum solution. We demonstrate the real-world effectiveness of our algorithm, SIG, confirming our analysis, through an extensive set of experiments on several applications, including logistic regression and training neural networks. We also include a method that provides a deliberate deterministic ordering of the data subset that is quite effective in practice. We observe that our method, while achieving practically the same loss, speeds up gradient methods by up to 10x for convex and 3x for non-convex (deep) functions.


Probabilistic Structure Learning for EEG/MEG Source Imaging with Hierarchical Graph Prior

arXiv.org Machine Learning

Brain source imaging is an important method for noninvasively characterizing brain activity using Electroencephalogram (EEG) or Magnetoencephalography (MEG) recordings. Traditional EEG/MEG Source Imaging (ESI) methods usually assume that either source activity at different time points is unrelated, or that similar spatiotemporal patterns exist across an entire study period. The former assumption makes ESI analyses sensitive to noise, while the latter renders ESI analyses unable to account for time-varying patterns of activity. To effectively deal with noise while maintaining flexibility and continuity among brain activation patterns, we propose a novel probabilistic ESI model based on a hierarchical graph prior. Under our method, a spanning tree constraint ensures that activity patterns have spatiotemporal continuity. An efficient algorithm based on alternating convex search is presented to solve the proposed model and is provably convergent. Comprehensive numerical studies using synthetic data on a real brain model are conducted under different levels of signal-to-noise ratio (SNR) from both sensor and source spaces. We also examine the EEG/MEG data in a real application, in which our ESI reconstructions are neurologically plausible. All the results demonstrate significant improvements of the proposed algorithm over the benchmark methods in terms of source localization performance, especially at high noise levels.


General Purpose Incremental Covariance Update and Efficient Belief Space Planning via Factor-Graph Propagation Action Tree

arXiv.org Machine Learning

Fast covariance calculation is required both for SLAM (e.g.~in order to solve data association) and for evaluating the information-theoretic term for different candidate actions in belief space planning (BSP). In this paper we make two primary contributions. First, we develop a novel general-purpose incremental covariance update technique, which efficiently recovers specific covariance entries after any change in the inference problem, such as introduction of new observations/variables or re-linearization of the state vector. Our approach is shown to recover them faster than other state-of-the-art methods. Second, we present a computationally efficient approach for BSP in high-dimensional state spaces, leveraging our incremental covariance update method. State of the art BSP approaches perform belief propagation for each candidate action and then evaluate an objective function that typically includes an information-theoretic term, such as entropy or information gain. Yet, candidate actions often have similar parts (e.g. common trajectory parts), which are however evaluated separately for each candidate. Moreover, calculating the information-theoretic term involves a costly determinant computation of the entire information (covariance) matrix which is O(n^3) with n being dimension of the state or costly Schur complement operations if only marginal posterior covariance of certain variables is of interest. Our approach, rAMDL-Tree, extends our previous BSP method rAMDL, by exploiting incremental covariance calculation and performing calculation re-use between common parts of non-myopic candidate actions, such that these parts are evaluated only once, in contrast to existing approaches.


Gradient-Based Neural DAG Learning

arXiv.org Machine Learning

We propose a novel score-based approach to learning a directed acyclic graph (DAG) from observational data. We adapt a recently proposed continuous constrained optimization formulation to allow for nonlinear relationships between variables using neural networks. This extension allows to model complex interactions while being more global in its search compared to other greedy approaches. In addition to comparing our method to existing continuous optimization methods, we provide missing empirical comparisons to nonlinear greedy search methods. On both synthetic and real-world data sets, this new method outperforms current continuous methods on most tasks while being competitive with existing greedy search methods on important metrics for causal inference.


Unsupervised Deep Learning for Ultra-reliable and Low-latency Communications

arXiv.org Machine Learning

In this paper, we study how to solve resource allocation problems in ultra-reliable and low-latency communications by unsupervised deep learning, which often yield functional optimization problems with quality-of-service (QoS) constraints. We take a joint power and bandwidth allocation problem as an example, which minimizes the total bandwidth required to guarantee the QoS of each user in terms of the delay bound and overall packet loss probability. The global optimal solution is found in a symmetric scenario. A neural network was introduced to find an approximated optimal solution in general scenarios, where the QoS is ensured by using the property that the optimal solution should satisfy as the "supervision signal". Simulation results show that the learning-based solution performs the same as the optimal solution in the symmetric scenario, and can save around 40% bandwidth with respect to the state-of-the-art policy.


A view of Estimation of Distribution Algorithms through the lens of Expectation-Maximization

arXiv.org Machine Learning

We show that under mild conditions, Estimation of Distribution Algorithms (EDAs) can be written as variational Expectation-Maximization (EM) that uses a mixture of weighted particles as the approximate posterior. In the infinite particle limit, EDAs can be viewed as exact EM. Because EM sits on a rigorous statistical foundation and has been thoroughly analyzed, this connection provides a coherent framework with which to reason about EDAs. Importantly, the connection also suggests avenues for possible improvements to EDAs owing to our ability to leverage general statistical tools and generalizations of EM. For example, we make use of results about known EM convergence properties to propose an adaptive, hybrid EDA-gradient descent algorithm; this hybrid demonstrates better performance than either component of the hybrid on several canonical, non-convex test functions. We also demonstrate empirically that although one might hypothesize that reducing the variational gap could prove useful, it actually degrades performance of EDAs. Finally, we show that the connection between EM and EDAs provides us with a new perspective on why EDAs are performing approximate natural gradient descent.


Revenue Optimization Engine's New Machine Learning Model Improves Prediction Accuracy

#artificialintelligence

In a previous blog post, we talked about how Recurly uses machine learning to optimize subscription billing for our customers and prevent involuntary churn. As part of our goal to help our customers maximize their subscription revenue, we introduced the Revenue Optimization Engine in 2018. When a recurring transaction fails, this technology creates a customized retry schedule, so subsequent retries of that transaction have a higher chance of succeeding. This technology is driven by machine learning which relies on models based on Recurly's incredible breadth of historical subscription data which identifies factors that are highly correlated with successful transaction processing. Our machine learning model is the backbone of the Revenue Optimization Engine.


Bayesian Optimization of Composite Functions

arXiv.org Machine Learning

We consider optimization of composite objective functions, i.e., of the form $f(x)=g(h(x))$, where $h$ is a black-box derivative-free expensive-to-evaluate function with vector-valued outputs, and $g$ is a cheap-to-evaluate real-valued function. While these problems can be solved with standard Bayesian optimization, we propose a novel approach that exploits the composite structure of the objective function to substantially improve sampling efficiency. Our approach models $h$ using a multi-output Gaussian process and chooses where to sample using the expected improvement evaluated on the implied non-Gaussian posterior on $f$, which we call expected improvement for composite functions (\ei). Although \ei\ cannot be computed in closed form, we provide a novel stochastic gradient estimator that allows its efficient maximization. We also show that our approach is asymptotically consistent, i.e., that it recovers a globally optimal solution as sampling effort grows to infinity, generalizing previous convergence results for classical expected improvement. Numerical experiments show that our approach dramatically outperforms standard Bayesian optimization benchmarks, reducing simple regret by several orders of magnitude.