Energy
Survival of the flexible: explaining the recent dominance of nature-inspired optimization within a rapidly evolving world
Although researchers often comment on the rising popularity of nature-inspired meta-heuristics (NIM), there has been a paucity of data to directly support the claim that NIM are growing in prominence compared to other optimization techniques. This study presents evidence that the use of NIM is not only growing, but indeed appears to have surpassed mathematical optimization techniques (MOT) in several important metrics related to academic research activity (publication frequency) and commercial activity (patenting frequency). Motivated by these findings, this article discusses some of the possible origins of this growing popularity. I review different explanations for NIM popularity and discuss why some of these arguments remain unsatisfying. I argue that a compelling and comprehensive explanation should directly account for the manner in which most NIM success has actually been achieved, e.g. through hybridization and customization to different problem environments. By taking a problem lifecycle perspective, this paper offers a fresh look at the hypothesis that nature-inspired meta-heuristics derive much of their utility from being flexible. I discuss global trends within the business environments where optimization algorithms are applied and I speculate that highly flexible algorithm frameworks could become increasingly popular within our diverse and rapidly changing world.
Dynamic Incentive Mechanisms
Parkes, David C. (Harvard University) | Cavallo, Ruggiero (University of Pennsylvania) | Constantin, Florin (Georgia Institute of Technology) | Singh, Satinder (University of Michigan)
Much of AI is concerned with the design of intelligent agents. A complementary challenge is to understand how to design โrules of encounterโ by which to promote simple, robust and beneficial interactions between multiple intelligent agents. This is a natural development, as AI is increasingly used for automated decision making in real-world settings. As we extend the ideas of mechanism design from economic theory, the mechanisms (or rules) become algorithmic and many new challenges surface. Starting with a short background on mechanism design theory, the aim of this paper is to provide a nontechnical exposition of recent results on dynamic incentive mechanisms, which provide rules for the coordination of agents in sequential decision problems. The framework of dynamic mechanism design embraces coordinated decision-making both in the context of uncertainty about the world external to an agent and also in regard to the dynamics of agent preferences. In addition to tracing some recent developments, we point to ongoing research challenges.
Global seismic monitoring as probabilistic inference
Arora, Nimar, Russell, Stuart J., Kidwell, Paul, Sudderth, Erik B.
The International Monitoring System (IMS) is a global network of sensors whose purpose is to identify potential violations of the Comprehensive Nuclear-Test-Ban Treaty (CTBT), primarily through detection and localization of seismic events. We report on the first stage of a project to improve on the current automated software system with a Bayesian inference system that computes the most likely global event history given the record of local sensor data. The new system, VISA (Vertically Integrated Seismological Analysis), is based on empirically calibrated, generative models of event occurrence, signal propagation, and signal detection. VISA exhibits significantly improved precision and recall compared to the current operational system and is able to detect events that are missed even by the human analysts who post-process the IMS output.
Predictive State Temporal Difference Learning
Boots, Byron, Gordon, Geoffrey J.
We propose a new approach to value function approximation which combines linear temporal difference reinforcement learning with subspace identification. In practical applications, reinforcement learning (RL) is complicated by the fact that state is either high-dimensional or partially observable. Therefore, RL methods are designed to work with features of state rather than state itself, and the success or failure of learning is often determined by the suitability of the selected features. By comparison, subspace identification (SSID) methods are designed to select a feature set which preserves as much information as possible about state. In this paper we connect the two approaches, looking at the problem of reinforcement learning with a large set of features, each of which may only be marginally useful for value function approximation. We introduce a new algorithm for this situation, called Predictive State Temporal Difference (PSTD) learning. As in SSID for predictive state representations, PSTD finds a linear compression operator that projects a large set of features down to a small set that preserves the maximum amount of predictive information. As in RL, PSTD then uses a Bellman recursion to estimate a value function. We discuss the connection between PSTD and prior approaches in RL and SSID. We prove that PSTD is statistically consistent, perform several experiments that illustrate its properties, and demonstrate its potential on a difficult optimal stopping problem.
Group Sparse Coding with a Laplacian Scale Mixture Prior
Garrigues, Pierre, Olshausen, Bruno A.
We propose a class of sparse coding models that utilizes a Laplacian Scale Mixture (LSM) prior to model dependencies among coefficients. Each coefficient is modeled as a Laplacian distribution with a variable scale parameter, with a Gamma distribution prior over the scale parameter. We show that, due to the conjugacy of the Gamma prior, it is possible to derive efficient inference procedures for both the coefficients and the scale parameter. When the scale parameters of a group of coefficients are combined into a single variable, it is possible to describe the dependencies that occur due to common amplitude fluctuations among coefficients, which have been shown to constitute a large fraction of the redundancy in natural images. We show that, as a consequence of this group sparse coding, the resulting inference of the coefficients follows a divisive normalization rule, and that this may be efficiently implemented a network architecture similar to that which has been proposed to occur in primary visual cortex. We also demonstrate improvements in image coding and compressive sensing recovery using the LSM model.
Sample Complexity of Testing the Manifold Hypothesis
Narayanan, Hariharan, Mitter, Sanjoy
The hypothesis that high dimensional data tends to lie in the vicinity of a low dimensional manifold is the basis of a collection of methodologies termed Manifold Learning. In this paper, we study statistical aspects of the question of fitting a manifold with a nearly optimal least squared error. Given upper bounds on the dimension, volume, and curvature, we show that Empirical Risk Minimization can produce a nearly optimal manifold using a number of random samples that is {\it independent} of the ambient dimension of the space in which data lie. We obtain an upper bound on the required number of samples that depends polynomially on the curvature, exponentially on the intrinsic dimension, and linearly on the intrinsic volume. For constant error, we prove a matching minimax lower bound on the sample complexity that shows that this dependence on intrinsic dimension, volume and curvature is unavoidable. Whether the known lower bound of $O(\frac{k}{\eps^2} + \frac{\log \frac{1}{\de}}{\eps^2})$ for the sample complexity of Empirical Risk minimization on $k-$means applied to data in a unit ball of arbitrary dimension is tight, has been an open question since 1997 \cite{bart2}. Here $\eps$ is the desired bound on the error and $\de$ is a bound on the probability of failure. We improve the best currently known upper bound \cite{pontil} of $O(\frac{k^2}{\eps^2} + \frac{\log \frac{1}{\de}}{\eps^2})$ to $O\left(\frac{k}{\eps^2}\left(\min\left(k, \frac{\log^4 \frac{k}{\eps}}{\eps^2}\right)\right) + \frac{\log \frac{1}{\de}}{\eps^2}\right)$. Based on these results, we devise a simple algorithm for $k-$means and another that uses a family of convex programs to fit a piecewise linear curve of a specified length to high dimensional data, where the sample complexity is independent of the ambient dimension.
Energy Disaggregation via Discriminative Sparse Coding
Kolter, J. Z., Batra, Siddharth, Ng, Andrew Y.
Energy disaggregation is the task of taking a whole-home energy signal and separating it into its component appliances. Studies have shown that having device-level energy information can cause users to conserve significant amounts of energy, but current electricity meters only report whole-home data. Thus, developing algorithmic methods for disaggregation presents a key technical challenge in the effort to maximize energy conservation. In this paper, we examine a large scale energy disaggregation task, and apply a novel extension of sparse coding to this problem. In particular, we develop a method, based upon structured prediction, for discriminatively training sparse coding algorithms specifically to maximize disaggregation performance. We show that this significantly improves the performance of sparse coding algorithms on the energy task and illustrate how these disaggregation results can provide useful information about energy usage.
Dynamic Infinite Relational Model for Time-varying Relational Data Analysis
Ishiguro, Katsuhiko, Iwata, Tomoharu, Ueda, Naonori, Tenenbaum, Joshua B.
We propose a new probabilistic model for analyzing dynamic evolutions of relational data, such as additions, deletions and split & merge, of relation clusters like communities in social networks. Our proposed model abstracts observed time-varying object-object relationships into relationships between object clusters. We extend the infinite Hidden Markov model to follow dynamic and time-sensitive changes in the structure of the relational data and to estimate a number of clusters simultaneously. We show the usefulness of the model through experiments with synthetic and real-world data sets.
Auto-Regressive HMM Inference with Incomplete Data for Short-Horizon Wind Forecasting
Barber, Chris, Bockhorst, Joseph, Roebber, Paul
Accurate short-term wind forecasts (STWFs), with time horizons from 0.5 to 6 hours, are essential for efficient integration of wind power to the electrical power grid. Physical models based on numerical weather predictions are currently not competitive, and research on machine learning approaches is ongoing. Two major challenges confronting these efforts are missing observations and weather-regime induced dependency shifts among wind variables at geographically distributed sites. In this paper we introduce approaches that address both of these challenges. We describe a new regime-aware approach to STWF that use auto-regressive hidden Markov models (AR-HMM), a subclass of conditional linear Gaussian (CLG) models. Although AR-HMMs are a natural representation for weather regimes, as with CLG models in general, exact inference is NP-hard when observations are missing (Lerner and Parr, 2001). Because of this high cost, we introduce a simple approximate inference method for AR-HMMs, which we believe has applications to other sequential and temporal problem domains that involve continuous variables. In an empirical evaluation on publicly available wind data from two geographically distinct regions, our approach makes significantly more accurate predictions than baseline models, and uncovers meteorologically relevant regimes.
Batch Bayesian Optimization via Simulation Matching
Azimi, Javad, Fern, Alan, Fern, Xiaoli Z.
Bayesian optimization methods are often used to optimize unknown functions that are costly to evaluate. Typically, these methods sequentially select inputs to be evaluated one at a time based on a posterior over the unknown function that is updated after each evaluation. There are a number of effective sequential policies for selecting the individual inputs. In many applications, however, it is desirable to perform multiple evaluations in parallel, which requires selecting batches of multiple inputs to evaluate at once. In this paper, we propose a novel approach to batch Bayesian optimization, providing a policy for selecting batches of inputs with the goal of optimizing the function as efficiently as possible. The key idea is to exploit the availability of high-quality and efficient sequential policies, by using Monte-Carlo simulation to select input batches that closely match their expected behavior. To the best of our knowledge, this is the first batch selection policy for Bayesian optimization. Our experimental results on six benchmarks show that the proposed approach significantly outperforms two baselines and can lead to large advantages over a top sequential approach in terms of performance per unit time.