Statistical Learning
A Primal-Dual Algorithmic Framework for Constrained Convex Minimization
Tran-Dinh, Quoc, Cevher, Volkan
We present a primal-dual algorithmic framework to obtain approximate solutions to a prototypical constrained convex optimization problem, and rigorously characterize how common structural assumptions affect the numerical efficiency. Our main analysis technique provides a fresh perspective on Nesterov's excessive gap technique in a structured fashion and unifies it with smoothing and primal-dual methods. For instance, through the choices of a dual smoothing strategy and a center point, our framework subsumes decomposition algorithms, augmented Lagrangian as well as the alternating direction method-of-multipliers methods as its special cases, and provides optimal convergence rates on the primal objective residual as well as the primal feasibility gap of the iterates for all.
Simple, Efficient, and Neural Algorithms for Sparse Coding
Arora, Sanjeev, Ge, Rong, Ma, Tengyu, Moitra, Ankur
Sparse coding is a basic task in many fields including signal processing, neuroscience and machine learning where the goal is to learn a basis that enables a sparse representation of a given set of data, if one exists. Its standard formulation is as a non-convex optimization problem which is solved in practice by heuristics based on alternating minimization. Re- cent work has resulted in several algorithms for sparse coding with provable guarantees, but somewhat surprisingly these are outperformed by the simple alternating minimization heuristics. Here we give a general framework for understanding alternating minimization which we leverage to analyze existing heuristics and to design new ones also with provable guarantees. Some of these algorithms seem implementable on simple neural architectures, which was the original motivation of Olshausen and Field (1997a) in introducing sparse coding. We also give the first efficient algorithm for sparse coding that works almost up to the information theoretic limit for sparse recovery on incoherent dictionaries. All previous algorithms that approached or surpassed this limit run in time exponential in some natural parameter. Finally, our algorithms improve upon the sample complexity of existing approaches. We believe that our analysis framework will have applications in other settings where simple iterative algorithms are used.
A Hebbian/Anti-Hebbian Network Derived from Online Non-Negative Matrix Factorization Can Cluster and Discover Sparse Features
Pehlevan, Cengiz, Chklovskii, Dmitri B.
Despite our extensive knowledge of biophysical properties of neurons, there is no commonly accepted algorithmic theory of neuronal function. Here we explore the hypothesis that single-layer neuronal networks perform online symmetric nonnegative matrix factorization (SNMF) of the similarity matrix of the streamed data. By starting with the SNMF cost function we derive an online algorithm, which can be implemented by a biologically plausible network with local learning rules. We demonstrate that such network performs soft clustering of the data as well as sparse feature discovery. The derived algorithm replicates many known aspects of sensory anatomy and biophysical properties of neurons including unipolar nature of neuronal activity and synaptic weights, local synaptic plasticity rules and the dependence of learning rate on cumulative neuronal activity. Thus, we make a step towards an algorithmic theory of neuronal function, which should facilitate large-scale neural circuit simulations and biologically inspired artificial intelligence.
Bayesian Optimization of Text Representations
Yogatama, Dani, Smith, Noah A.
When applying machine learning to problems in NLP, there are many choices to make about how to represent input texts. These choices can have a big effect on performance, but they are often uninteresting to researchers or practitioners who simply need a module that performs well. We propose an approach to optimizing over this space of choices, formulating the problem as global optimization. We apply a sequential model-based optimization technique and show that our method makes standard linear models competitive with more sophisticated, expensive state-of-the-art methods based on latent variable models or neural networks on various topic classification and sentiment analysis problems. Our approach is a first step towards black-box NLP systems that work with raw text and do not require manual tuning.
A review of mean-shift algorithms for clustering
A natural way to characterize the cluster structure of a dataset is by finding regions containing a high density of data. This can be done in a nonparametric way with a kernel density estimate, whose modes and hence clusters can be found using mean-shift algorithms. We describe the theory and practice behind clustering based on kernel density estimates and mean-shift algorithms. We discuss the blurring and non-blurring versions of mean-shift; theoretical results about mean-shift algorithms and Gaussian mixtures; relations with scale-space theory, spectral clustering and other algorithms; extensions to tracking, to manifold and graph data, and to manifold denoising; K-modes and Laplacian K-modes algorithms; acceleration strategies for large datasets; and applications to image segmentation, manifold denoising and multivalued regression.
Signal inference with unknown response: Calibration-uncertainty renormalized estimator
Dorn, Sebastian, Enßlin, Torsten A., Greiner, Maksim, Selig, Marco, Boehm, Vanessa
The calibration of a measurement device is crucial for every scientific experiment, where a signal has to be inferred from data. We present CURE, the calibration uncertainty renormalized estimator, to reconstruct a signal and simultaneously the instrument's calibration from the same data without knowing the exact calibration, but its covariance structure. The idea of CURE, developed in the framework of information field theory, is starting with an assumed calibration to successively include more and more portions of calibration uncertainty into the signal inference equations and to absorb the resulting corrections into renormalized signal (and calibration) solutions. Thereby, the signal inference and calibration problem turns into solving a single system of ordinary differential equations and can be identified with common resummation techniques used in field theories. We verify CURE by applying it to a simplistic toy example and compare it against existent self-calibration schemes, Wiener filter solutions, and Markov Chain Monte Carlo sampling. We conclude that the method is able to keep up in accuracy with the best self-calibration methods and serves as a non-iterative alternative to it.
Discovering Hotspots and Coldspots of Species Richness in eBird Data
Moore, Travis (Oregon State University) | Wong, Weng-Keen (Oregon State University)
Quantifying biodiversity is an important task related to ecological research. One way to measure biodiversity is through species richness, which measures the number of unique species found in an area. Recently, citizen science biodiversity datasets such as eBird allow the calculation of species richness over an unprecedented spatial and temporal extent. However, several confounding factors associated with the unstructured observation process, such as observer effort, affect the number of species reported by citizen scientists. In this work, we develop an algorithm for discovering hotspots and coldspots of species richness using eBird data while accounting for these confounding factors.
Sustainable Building Design: A Challenge at the Intersection of Machine Learning and Design Optimization
Gilan, Siamak Safarzadegan (Georgia Institute of Technology) | Dilkina, Bistra (Georgia Institute of Technology)
Residential and commercial buildings are responsible for about 40% of primary energy consumption in the United States, hence improving their energy efficiency could have important sustainability benefits. The design of a building has tremendous effect on its energy profile, and recently there has been an increased interest in developing optimization methods that support the design of high performance buildings. Previous approaches are either based on simulation optimization or on training an accurate predictive model that is queried during the optimization. We propose a method that more tightly integrates the machine learning and optimization components, by employing active learning during optimization. In particular, we use a Gaussian Process (GP) model for the prediction and active learning and multi-objective genetic algorithm NSGA-II for the optimization. We develop a comprehensive and publicly available benchmark for building design optimization. We evaluate 5 machine learning approaches on our dataset, and show that the GP model is competitive, in addition to being well-suited for the active learning setting. We compare our optimization approach against the 2-stage approach and simulation optimization. Our results show that our approach produces solutions at the Pareto frontier compared to the other two approaches, while using only a fraction of the simulations and time.
Social Information Improves Location Prediction in the Wild
Li, Jai (University of Illinois at Chicago) | Brugere, Ivan (University of Illinois at Chicago) | Ziebart, Brian (University of Illinois at Chicago) | Berger-Wolf, Tanya (University of Illinois at Chicago) | Crofoot, Margaret (University of California-Davis) | Farine, Damien (University of California-Davis)
How can knowing the location of my friends be used to more accurately predict my location? This paper explores socially-aware location prediction under a particularly challenging setting where the underlying interactions and social network are unknown and must be inferred over continuous spatiotemporal data. Our method samples inferred network topology using a linear regression model to predict future individual locations. We present an in-depth empirical study comparing different network models and network sampling regimes under a bootstrapped sampling baseline. Furthermore, our qualitative analysis demonstrates the value of social information in population mobility modeling under our application’s challenges.
A Proposal for Behavior Prediction via Estimating Agents’ Evaluation Functions Using Prior Observations of Behavior
Loftin, Robert Tyler (North Carolina State University) | Roberts, David L. (North Carolina State University)
In this work we present a theoretical approach (not currently implemented), to the problem of predicting agent behavior. The ultimate goal of this work is to learn models that can be used to predict the future actions of intelligent agents, based on previously recorded data on those agents’ behavior. We believe that we can improve the predictive accuracy of our models by assuming that an agent reasons about the actions it takes, and trying to explicitly model that reasoning process. Here, we model an agent’s reasoning process as a form of Monte-Carlo search, and attempt to learn a state evaluation function that, when used with this planning algorithm, yields a similar distribution of actions given the current state of the world as we observe in the data. While it is simple to simulate Monte-Carlo search given an evaluation function, it is much more difficult to determine an evaluation function that will generate a certain behavior. Here we will use Expectation-Maximization to find a maximum likelihood estimate of the parameters of the evaluation function, treating the actual steps taken in planning each action as unobserved data.