Energy
Direct Learning of Sparse Changes in Markov Networks by Density Ratio Estimation
Liu, Song, Quinn, John A., Gutmann, Michael U., Suzuki, Taiji, Sugiyama, Masashi
We propose a new method for detecting changes in Markov network structure between two sets of samples. Instead of naively fitting two Markov network models separately to the two data sets and figuring out their difference, we \emph{directly} learn the network structure change by estimating the ratio of Markov network models. This density-ratio formulation naturally allows us to introduce sparsity in the network structure change, which highly contributes to enhancing interpretability. Furthermore, computation of the normalization term, which is a critical bottleneck of the naive approach, can be remarkably mitigated. We also give the dual formulation of the optimization problem, which further reduces the computation cost for large-scale Markov networks. Through experiments, we demonstrate the usefulness of our method.
High-Dimensional Gaussian Process Bandits
Djolonga, Josip, Krause, Andreas, Cevher, Volkan
Many applications in machine learning require optimizing unknown functions defined over a high-dimensional space from noisy samples that are expensive to obtain. We address this notoriously hard challenge, under the assumptions that the function varies only along some low-dimensional subspace and is smooth (i.e., it has a low norm in a Reproducible Kernel Hilbert Space). In particular, we present the SI-BO algorithm, which leverages recent low-rank matrix recovery techniques to learn the underlying subspace of the unknown function and applies Gaussian Process Upper Confidence sampling for optimization of the function. We carefully calibrate the exploration-exploitation tradeoff by allocating the sampling budget to subspace estimation and function optimization, and obtain the first subexponential cumulative regret bounds and convergence rates for Bayesian optimization in high-dimensions under noisy observations. Numerical results demonstrate the effectiveness of our approach in difficult scenarios.
A Graphical Transformation for Belief Propagation: Maximum Weight Matchings and Odd-Sized Cycles
Shin, Jinwoo, Gelfand, Andrew E., Chertkov, Misha
Max-product ‘belief propagation’ (BP) is a popular distributed heuristic for finding the Maximum A Posteriori (MAP) assignment in a joint probability distribution represented by a Graphical Model (GM). It was recently shown that BP converges to the correct MAP assignment for a class of loopy GMs with the following common feature: the Linear Programming (LP) relaxation to the MAP problem is tight (has no integrality gap). Unfortunately, tightness of the LP relaxation does not, in general, guarantee convergence and correctness of the BP algorithm. The failure of BP in such cases motivates reverse engineering a solution – namely, given a tight LP, can we design a ‘good’ BP algorithm. In this paper, we design a BP algorithm for the Maximum Weight Matching (MWM) problem over general graphs. We prove that the algorithm converges to the correct optimum if the respective LP relaxation, which may include inequalities associated with non-intersecting odd-sized cycles, is tight. The most significant part of our approach is the introduction of a novel graph transformation designed to force convergence of BP. Our theoretical result suggests an efficient BP-based heuristic for the MWM problem, which consists of making sequential, “cutting plane”, modifications to the underlying GM. Our experiments show that this heuristic performs as well as traditional cutting-plane algorithms using LP solvers on MWM problems.
Small-Variance Asymptotics for Hidden Markov Models
Roychowdhury, Anirban, Jiang, Ke, Kulis, Brian
Small-variance asymptotics provide an emerging technique for obtaining scalable combinatorial algorithms from rich probabilistic models. We present a small-variance asymptotic analysis of the Hidden Markov Model and its infinite-state Bayesian nonparametric extension. Starting with the standard HMM, we first derive a “hard” inference algorithm analogous to k-means that arises when particular variances in the model tend to zero. This analysis is then extended to the Bayesian nonparametric case, yielding a simple, scalable, and flexible algorithm for discrete-state sequence data with a non-fixed number of states. We also derive the corresponding combinatorial objective functions arising from our analysis, which involve a k-means-like term along with penalties based on state transitions and the number of states. A key property of such algorithms is that — particularly in the nonparametric setting — standard probabilistic inference algorithms lack scalability and are heavily dependent on good initialization. A number of results on synthetic and real data sets demonstrate the advantages of the proposed framework.
Forgetful Bayes and myopic planning: Human learning and decision-making in a bandit setting
How humans achieve long-term goals in an uncertain environment, via repeated trials and noisy observations, is an important problem in cognitive science. We investigate this behavior in the context of a multi-armed bandit task. We compare human behavior to a variety of models that vary in their representational and computational complexity. Our result shows that subjects' choices, on a trial-to- trial basis, are best captured by a "forgetful" Bayesian iterative learning model [21] in combination with a partially myopic decision policy known as Knowledge Gradient [7]. This model accounts for subjects' trial-by-trial choice better than a number of other previously proposed models, including optimal Bayesian learning and risk minimization, ε-greedy and win-stay-lose-shift. It has the added benefit of being closest in performance to the optimal Bayesian model than all the other heuristic models that have the same computational complexity (all are significantly less complex than the optimal model). These results constitute an advancement in the theoretical understanding of how humans negotiate the tension between exploration and exploitation in a noisy, imperfectly known environment.
Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
Sharpnack, James L., Krishnamurthy, Akshay, Singh, Aarti
The detection of anomalous activity in graphs is a statistical problem that arises in many applications, such as network surveillance, disease outbreak detection, and activity monitoring in social networks. Beyond its wide applicability, graph structured anomaly detection serves as a case study in the difficulty of balancing computational complexity with statistical power. In this work, we develop from first principles the generalized likelihood ratio test for determining if there is a well connected region of activation over the vertices in the graph in Gaussian noise. Because this test is computationally infeasible, we provide a relaxation, called the Lov\'asz extended scan statistic (LESS) that uses submodularity to approximate the intractable generalized likelihood ratio. We demonstrate a connection between LESS and maximum a-posteriori inference in Markov random fields, which provides us with a poly-time algorithm for LESS. Using electrical network theory, we are able to control type 1 error for LESS and prove conditions under which LESS is risk consistent. Finally, we consider specific graph models, the torus, $k$-nearest neighbor graphs, and $\epsilon$-random graphs. We show that on these graphs our results provide near-optimal performance by matching our results to known lower bounds.
Robust Data-Driven Dynamic Programming
Hanasusanto, Grani Adiwena, Kuhn, Daniel
In stochastic optimal control the distribution of the exogenous noise is typically unknown and must be inferred from limited data before dynamic programming (DP)-based solution schemes can be applied. If the conditional expectations in the DP recursions are estimated via kernel regression, however, the historical sample paths enter the solution procedure directly as they determine the evaluation points of the cost-to-go functions. The resulting data-driven DP scheme is asymptotically consistent and admits efficient computational solution when combined with parametric value function approximations. If training data is sparse, however, the estimated cost-to-go functions display a high variability and an optimistic bias, while the corresponding control policies perform poorly in out-of-sample tests. To mitigate these small sample effects, we propose a robust data-driven DP scheme, which replaces the expectations in the DP recursions with worst-case expectations over a set of distributions close to the best estimate. We show that the arising min-max problems in the DP recursions reduce to tractable conic programs. We also demonstrate that this robust algorithm dominates state-of-the-art benchmark algorithms in out-of-sample tests across several application domains.
Shape-constrained Estimation of Value Functions
Mousavi, Mohammad, Glynn, Peter W.
We present a fully nonparametric method to estimate the value function, via simulation, in the context of expected infinite-horizon discounted rewards for Markov chains. Estimating such value functions plays an important role in approximate dynamic programming and applied probability in general. We incorporate "soft information" into the estimation algorithm, such as knowledge of convexity, monotonicity, or Lipchitz constants. In the presence of such information, a nonparametric estimator for the value function can be computed that is provably consistent as the simulated time horizon tends to infinity. As an application, we implement our method on price tolling agreement contracts in energy markets.
Supervised learning of a regression model based on latent process. Application to the estimation of fuel cell life time
Onanena, Raïssa, Chamroukhi, Faicel, Oukhellou, Latifa, Candusso, Denis, Aknin, Patrice, Hissel, Daniel
This paper describes a pattern recognition approach aiming to estimate fuel cell duration time from electrochemical impedance spectroscopy measurements. It consists in first extracting features from both real and imaginary parts of the impedance spectrum. A parametric model is considered in the case of the real part, whereas regression model with latent variables is used in the latter case. Then, a linear regression model using different subsets of extracted features is used fo r the estimation of fuel cell time duration. The performances of the proposed approach are evaluated on experimental data set to show its feasibility. This could lead to interesting perspectives for predictive maintenance policy of fuel cell.
Contextually Supervised Source Separation with Application to Energy Disaggregation
We propose a new framework for single-channel source separation that lies between the fully supervised and unsupervised setting. Instead of supervision, we provide input features for each source signal and use convex methods to estimate the correlations between these features and the unobserved signal decomposition. We analyze the case of $\ell_2$ loss theoretically and show that recovery of the signal components depends only on cross-correlation between features for different signals, not on correlations between features for the same signal. Contextually supervised source separation is a natural fit for domains with large amounts of data but no explicit supervision; our motivating application is energy disaggregation of hourly smart meter data (the separation of whole-home power signals into different energy uses). Here we apply contextual supervision to disaggregate the energy usage of thousands homes over four years, a significantly larger scale than previously published efforts, and demonstrate on synthetic data that our method outperforms the unsupervised approach.