Goto

Collaborating Authors

 Optimization


Generalized Majorization-Minimization

arXiv.org Machine Learning

School of Engineering, Brown University Providence, RI 02912, USA Abstract Non-convex optimization is ubiquitous in machine learning. The bound at each iteration is required to touch the objective function at the optimizer of the previous bound. We show that this touching constraint is unnecessary and overly restrictive. We generalize MM by relaxing this constraint, and propose a new optimization framework, named Generalized Majorization-Minimization (G-MM) that is more flexible compared to MM. For instance, it can incorporate application-specific biases into the optimization procedure without changing the objective function. We derive G-MM algorithms for several latent variable models and show empirically that they consistently outperform their MM counterparts in optimizing non-convex objectives. In particular, G-MM algorithms appear to be less sensitive to initialization. Keywords: majorization-minimization, non-convex optimization, latent variable models, expectation maximization 1. Introduction Non-convex optimization is ubiquitous in machine learning. Majorization-Minimization (MM) (Hunter et al., 2000) is an optimization framework for designing well-behaved optimization algorithms for non-convex functions. MM algorithms work by iteratively optimizing a sequence of easy-to-optimize surrogate functions that bound the objective. Two of the most successful instances of MM algorithms are Expectation-Maximization (EM) (Dempster et al., 1977) and the Concave-Convex Proce-1 arXiv:1506.07613v2 However, both have a number of drawbacks in practice, such as sensitivity to initialization and lack of uncertainty modeling for latent variables. This has been noted in works such as (Neal and Hinton, 1998; Felzenszwalb et al., 2010; Parizi et al., 2012; Kumar et al., 2012; Ping et al., 2014). We propose a new procedure, Generalized Majorization-Minimization (G-MM), for optimizing non-convex objective functions. Our approach is inspired by MM, but we generalize the bound construction process.


Channel Vector Subspace Estimation from Low-Dimensional Projections

arXiv.org Machine Learning

Massive MIMO is a variant of multiuser MIMO where the number of base-station antennas $M$ is very large (typically 100), and generally much larger than the number of spatially multiplexed data streams (typically 10). Unfortunately, the front-end A/D conversion necessary to drive hundreds of antennas, with a signal bandwidth of the order of 10 to 100 MHz, requires very large sampling bit-rate and power consumption. In order to reduce such implementation requirements, Hybrid Digital-Analog architectures have been proposed. In particular, our work in this paper is motivated by one of such schemes named Joint Spatial Division and Multiplexing (JSDM), where the downlink precoder (resp., uplink linear receiver) is split into the product of a baseband linear projection (digital) and an RF reconfigurable beamforming network (analog), such that only a reduced number $m \ll M$ of A/D converters and RF modulation/demodulation chains is needed. In JSDM, users are grouped according to the similarity of their channel dominant subspaces, and these groups are separated by the analog beamforming stage, where the multiplexing gain in each group is achieved using the digital precoder. Therefore, it is apparent that extracting the channel subspace information of the $M$-dim channel vectors from snapshots of $m$-dim projections, with $m \ll M$, plays a fundamental role in JSDM implementation. In this paper, we develop novel efficient algorithms that require sampling only $m = O(2\sqrt{M})$ specific array elements according to a coprime sampling scheme, and for a given $p \ll M$, return a $p$-dim beamformer that has a performance comparable with the best p-dim beamformer that can be designed from the full knowledge of the exact channel covariance matrix. We assess the performance of our proposed estimators both analytically and empirically via numerical simulations.


Symmetry-free SDP Relaxations for Affine Subspace Clustering

arXiv.org Machine Learning

We consider clustering problems where the goal is to determine an optimal partition of a given point set in Euclidean space in terms of a collection of affine subspaces. While there is vast literature on heuristics for this kind of problem, such approaches are known to be susceptible to poor initializations and getting trapped in bad local optima. We alleviate these issues by introducing a semidefinite relaxation based on Lasserre's method of moments. While a similiar approach is known for classical Euclidean clustering problems, a generalization to our more general subspace scenario is not straightforward, due to the high symmetry of the objective function that weakens any convex relaxation. We therefore introduce a new mechanism for symmetry breaking based on covering the feasible region with polytopes. Additionally, we introduce and analyze a deterministic rounding heuristic.


Scalable Link Prediction in Dynamic Networks via Non-Negative Matrix Factorization

arXiv.org Artificial Intelligence

We propose a scalable temporal latent space model for link prediction in dynamic social networks, where the goal is to predict links over time based on a sequence of previous graph snapshots. The model assumes that each user lies in an unobserved latent space and interactions are more likely to form between similar users in the latent space representation. In addition, the model allows each user to gradually move its position in the latent space as the network structure evolves over time. We present a global optimization algorithm to effectively infer the temporal latent space, with a quadratic convergence rate. Two alternative optimization algorithms with local and incremental updates are also proposed, allowing the model to scale to larger networks without compromising prediction accuracy. Empirically, we demonstrate that our model, when evaluated on a number of real-world dynamic networks, significantly outperforms existing approaches for temporal link prediction in terms of both scalability and predictive power.


Open Sourcing SparkADMM: a Massively-parallel Framework for Solving Big Data Problems

#artificialintelligence

Training machine learning models over massive amounts of data is a cornerstone of many data analytics tasks. Usually this involves solving large optimization problems involving millions of optimization variables and constraints. Doing so over a parallel platform, like Spark or Hadoop, is crucial to making such computations scalable. It is not always obvious how to solve large optimization problems in parallel. ADMM, which stands for the Alternating Directions Method of Multipliers, is a popular parallel optimization technique that provides a methodology for doing so.


Causality on Cross-Sectional Data: Stable Specification Search in Constrained Structural Equation Modeling

arXiv.org Machine Learning

Causal modeling has long been an attractive topic for many researchers and in recent decades there has seen a surge in theoretical development and discovery algorithms. Generally discovery algorithms can be divided into two approaches: constraint-based and score-based. The constraint-based approach is able to detect common causes of the observed variables but the use of independence tests makes it less reliable. The score-based approach produces a result that is easier to interpret as it also measures the reliability of the inferred causal relationships, but it is unable to detect common confounders of the observed variables. A drawback of both score-based and constrained-based approaches is the inherent instability in structure estimation. With finite samples small changes in the data can lead to completely different optimal structures. The present work introduces a new hypothesis-free score-based causal discovery algorithm, called stable specification search, that is robust for finite samples based on recent advances in stability selection using subsampling and selection algorithms. Structure search is performed over Structural Equation Models. Our approach uses exploratory search but allows incorporation of prior background knowledge. We validated our approach on one simulated data set, which we compare to the known ground truth, and two real-world data sets for Chronic Fatigue Syndrome and Attention Deficit Hyperactivity Disorder, which we compare to earlier medical studies. The results on the simulated data set show significant improvement over alternative approaches and the results on the real-word data sets show consistency with the hypothesis driven models constructed by medical experts.


Safe Policy Improvement by Minimizing Robust Baseline Regret

arXiv.org Machine Learning

Many problems in science and engineering can be formulated as a sequential decision-making problem under uncertainty. A common scenario in such problems that occurs in many different fields, such as online marketing, inventory control, health informatics, and computational finance, is to find a good or an optimal strategy/policy, given a batch of data generated by the current strategy of the company (hospital, investor). Although there are many techniques to find a good policy given a batch of data, only a few of them guarantee that the obtained policy will perform well, when it is deployed. Since deploying an untested policy can be risky for the business, the product (hospital, investment) manager does not usually allow it to happen, unless we provide her/him with some performance guarantees of the obtained strategy, in comparison to the baseline policy (e.g., the policy that is currently in use). In this paper, we focus on the model-based approach to this fundamental problem in the context of infinite-horizon discounted Markov decision processes (MDPs). In this approach, we use the batch of data and build a model or a simulator that approximates the true behavior of the dynamical system, together with an error function that captures the accuracy of the model at each state of the system. Our goal is to compute a safe policy, i.e., a policy that is guaranteed to perform at least as well as the baseline strategy, using the simulator and error function. Most of the work on this topic has been in the model-free setting, where safe policies are computed directly from the batch of data, without building an explicit model of the system [12, 13]. Another class of model-free algorithms are those that use a batch of data generated by the current policy and return a policy that is guaranteed to perform better.


Demand Prediction and Placement Optimization for Electric Vehicle Charging Stations

arXiv.org Artificial Intelligence

Effective placement of charging stations plays a key role in Electric Vehicle (EV) adoption. In the placement problem, given a set of candidate sites, an optimal subset needs to be selected with respect to the concerns of both (a) the charging station service provider, such as the demand at the candidate sites and the budget for deployment, and (b) the EV user, such as charging station reachability and short waiting times at the station. This work addresses these concerns, making the following three novel contributions: (i) a supervised multi-view learning framework using Canonical Correlation Analysis (CCA) for demand prediction at candidate sites, using multiple datasets such as points of interest information, traffic density, and the historical usage at existing charging stations; (ii) a mixed-packing-and- covering optimization framework that models competing concerns of the service provider and EV users; (iii) an iterative heuristic to solve these problems by alternately invoking knapsack and set cover algorithms. The performance of the demand prediction model and the placement optimization heuristic are evaluated using real world data.


On Deterministic Conditions for Subspace Clustering under Missing Data

arXiv.org Machine Learning

In this paper we present deterministic conditions for success of sparse subspace clustering (SSC) under missing data, when data is assumed to come from a Union of Subspaces (UoS) model. We consider two algorithms, which are variants of SSC with entry-wise zero-filling that differ in terms of the optimization problems used to find affinity matrix for spectral clustering. For both the algorithms, we provide deterministic conditions for any pattern of missing data such that perfect clustering can be achieved. We provide extensive sets of simulation results for clustering as well as completion of data at missing entries, under the UoS model. Our experimental results indicate that in contrast to the full data case, accurate clustering does not imply accurate subspace identification and completion, indicating the natural order of relative hardness of these problems.


Beating level-set methods for 3D seismic data interpolation: a primal-dual alternating approach

arXiv.org Machine Learning

Acquisition cost is a crucial bottleneck for seismic workflows, and low-rank formulations for data interpolation allow practitioners to `fill in' data volumes from critically subsampled data acquired in the field. Tremendous size of seismic data volumes required for seismic processing remains a major challenge for these techniques. We propose a new approach to solve residual constrained formulations for interpolation. We represent the data volume using matrix factors, and build a block-coordinate algorithm with constrained convex subproblems that are solved with a primal-dual splitting scheme. The new approach is competitive with state of the art level-set algorithms that interchange the role of objectives with constraints. We use the new algorithm to successfully interpolate a large scale 5D seismic data volume, generated from the geologically complex synthetic 3D Compass velocity model, where 80% of the data has been removed.