Industry
LSBN: A Large-Scale Bayesian Structure Learning Framework for Model Averaging
Lu, Yang, Wang, Mengying, Li, Menglu, Zhu, Qili, Yuan, Bo
The motivation for this paper is to apply Bayesian structure learning using Model Averaging in large-scale networks. Currently, Bayesian model averaging algorithm is applicable to networks with only tens of variables, restrained by its super-exponential complexity. We present a novel framework, called LSBN(Large-Scale Bayesian Network), making it possible to handle networks with infinite size by following the principle of divide-and-conquer. The method of LSBN comprises three steps. In general, LSBN first performs the partition by using a second-order partition strategy, which achieves more robust results. LSBN conducts sampling and structure learning within each overlapping community after the community is isolated from other variables by Markov Blanket. Finally LSBN employs an efficient algorithm, to merge structures of overlapping communities into a whole. In comparison with other four state-of-art large-scale network structure learning algorithms such as ARACNE, PC, Greedy Search and MMHC, LSBN shows comparable results in five common benchmark datasets, evaluated by precision, recall and f-score. What's more, LSBN makes it possible to learn large-scale Bayesian structure by Model Averaging which used to be intractable. In summary, LSBN provides an scalable and parallel framework for the reconstruction of network structures. Besides, the complete information of overlapping communities serves as the byproduct, which could be used to mine meaningful clusters in biological networks, such as protein-protein-interaction network or gene regulatory network, as well as in social network.
Least Absolute Gradient Selector: Statistical Regression via Pseudo-Hard Thresholding
Variable selection in linear models plays a pivotal role in modern statistics. Hard-thresholding methods such as $l_0$ regularization are theoretically ideal but computationally infeasible. In this paper, we propose a new approach, called the LAGS, short for "least absulute gradient selector", to this challenging yet interesting problem by mimicking the discrete selection process of $l_0$ regularization. To estimate $\beta$ under the influence of noise, we consider, nevertheless, the following convex program [\hat{\beta} = \textrm{arg min}\frac{1}{n}\|X^{T}(y - X\beta)\|_1 + \lambda_n\sum_{i = 1}^pw_i(y;X;n)|\beta_i|] $\lambda_n > 0$ controls the sparsity and $w_i > 0$ dependent on $y, X$ and $n$ is the weights on different $\beta_i$; $n$ is the sample size. Surprisingly, we shall show in the paper, both geometrically and analytically, that LAGS enjoys two attractive properties: (1) LAGS demonstrates discrete selection behavior and hard thresholding property as $l_0$ regularization by strategically chosen $w_i$, we call this property "pseudo-hard thresholding"; (2) Asymptotically, LAGS is consistent and capable of discovering the true model; nonasymptotically, LAGS is capable of identifying the sparsity in the model and the prediction error of the coefficients is bounded at the noise level up to a logarithmic factor---$\log p$, where $p$ is the number of predictors. Computationally, LAGS can be solved efficiently by convex program routines for its convexity or by simplex algorithm after recasting it into a linear program. The numeric simulation shows that LAGS is superior compared to soft-thresholding methods in terms of mean squared error and parsimony of the model.
Mean-Field Learning: a Survey
Tembine, Hamidou, Tempone, Raul, Vilanova, Pedro
In this paper we study iterative procedures for stationary equilibria in games with large number of players. Most of learning algorithms for games with continuous action spaces are limited to strict contraction best reply maps in which the Banach-Picard iteration converges with geometrical convergence rate. When the best reply map is not a contraction, Ishikawa-based learning is proposed. The algorithm is shown to behave well for Lipschitz continuous and pseudo-contractive maps. However, the convergence rate is still unsatisfactory. Several acceleration techniques are presented. We explain how cognitive users can improve the convergence rate based only on few number of measurements. The methodology provides nice properties in mean field games where the payoff function depends only on own-action and the mean of the mean-field (first moment mean-field games). A learning framework that exploits the structure of such games, called, mean-field learning, is proposed. The proposed mean-field learning framework is suitable not only for games but also for non-convex global optimization problems. Then, we introduce mean-field learning without feedback and examine the convergence to equilibria in beauty contest games, which have interesting applications in financial markets. Finally, we provide a fully distributed mean-field learning and its speedup versions for satisfactory solution in wireless networks. We illustrate the convergence rate improvement with numerical examples.
Learning Mixtures of Submodular Shells with Application to Document Summarization
We introduce a method to learn a mixture of submodular "shells" in a large-margin setting. A submodular shell is an abstract submodular function that can be instantiated with a ground set and a set of parameters to produce a submodular function. A mixture of such shells can then also be so instantiated to produce a more complex submodular function. What our algorithm learns are the mixture weights over such shells. We provide a risk bound guarantee when learning in a large-margin structured-prediction setting using a projected subgradient method when only approximate submodular optimization is possible (such as with submodular function maximization). We apply this method to the problem of multi-document summarization and produce the best results reported so far on the widely used NIST DUC-05 through DUC-07 document summarization corpora.
Variational Dual-Tree Framework for Large-Scale Transition Matrix Approximation
Amizadeh, Saeed, Thiesson, Bo, Hauskrecht, Milos
In recent years, non-parametric methods utilizing random walks on graphs have been used to solve a wide range of machine learning problems, but in their simplest form they do not scale well due to the quadratic complexity. In this paper, a new dual-tree based variational approach for approximating the transition matrix and efficiently performing the random walk is proposed. The approach exploits a connection between kernel density estimation, mixture modeling, and random walk on graphs in an optimization of the transition matrix for the data graph that ties together edge transitions probabilities that are similar. Compared to the de facto standard approximation method based on k-nearestneighbors, we demonstrate order of magnitudes speedup without sacrificing accuracy for Label Propagation tasks on benchmark data sets in semi-supervised learning.
An Improved Admissible Heuristic for Learning Optimal Bayesian Networks
Yuan, Changhe, Malone, Brandon
Recently two search algorithms, A* and breadth-first branch and bound (BFBnB), were developed based on a simple admissible heuristic for learning Bayesian network structures that optimize a scoring function. The heuristic represents a relaxation of the learning problem such that each variable chooses optimal parents independently. As a result, the heuristic may contain many directed cycles and result in a loose bound. This paper introduces an improved admissible heuristic that tries to avoid directed cycles within small groups of variables. A sparse representation is also introduced to store only the unique optimal parent choices. Empirical results show that the new techniques significantly improved the efficiency and scalability of A* and BFBnB on most of datasets tested in this paper.
Factorized Multi-Modal Topic Model
Virtanen, Seppo, Jia, Yangqing, Klami, Arto, Darrell, Trevor
Multi-modal data collections, such as corpora of paired images and text snippets, require analysis methods beyond single-view component and topic models. For continuous observations the current dominant approach is based on extensions of canonical correlation analysis, factorizing the variation into components shared by the different modalities and those private to each of them. For count data, multiple variants of topic models attempting to tie the modalities together have been presented. All of these, however, lack the ability to learn components private to one modality, and consequently will try to force dependencies even between minimally correlating modalities. In this work we combine the two approaches by presenting a novel HDP-based topic model that automatically learns both shared and private topics. The model is shown to be especially useful for querying the contents of one domain given samples of the other.
Latent Dirichlet Allocation Uncovers Spectral Characteristics of Drought Stressed Plants
Wahabzada, Mirwaes, Kersting, Kristian, Bauckhage, Christian, Roemer, Christoph, Ballvora, Agim, Pinto, Francisco, Rascher, Uwe, Leon, Jens, Ploemer, Lutz
Understanding the adaptation process of plants to drought stress is essential in improving management practices, breeding strategies as well as engineering viable crops for a sustainable agriculture in the coming decades. Hyper-spectral imaging provides a particularly promising approach to gain such understanding since it allows to discover non-destructively spectral characteristics of plants governed primarily by scattering and absorption characteristics of the leaf internal structure and biochemical constituents. Several drought stress indices have been derived using hyper-spectral imaging. However, they are typically based on few hyper-spectral images only, rely on interpretations of experts, and consider few wavelengths only. In this study, we present the first data-driven approach to discovering spectral drought stress indices, treating it as an unsupervised labeling problem at massive scale. To make use of short range dependencies of spectral wavelengths, we develop an online variational Bayes algorithm for latent Dirichlet allocation with convolved Dirichlet regularizer. This approach scales to massive datasets and, hence, provides a more objective complement to plant physiological practices. The spectral topics found conform to plant physiological knowledge and can be computed in a fraction of the time compared to existing LDA approaches.
Dynamic Teaching in Sequential Decision Making Environments
Walsh, Thomas J., Goschin, Sergiu
We describe theoretical bounds and a practical algorithm for teaching a model by demonstration in a sequential decision making environment. Unlike previous efforts that have optimized learners that watch a teacher demonstrate a static policy, we focus on the teacher as a decision maker who can dynamically choose different policies to teach different parts of the environment. We develop several teaching frameworks based on previously defined supervised protocols, such as Teaching Dimension, extending them to handle noise and sequences of inputs encountered in an MDP. We provide theoretical bounds on the learnability of several important model classes in this setting and suggest a practical algorithm for dynamic teaching.
Fast Graph Construction Using Auction Algorithm
In practical machine learning systems, graph based data representation has been widely used in various learning paradigms, ranging from unsupervised clustering to supervised classification. Besides those applications with natural graph or network structure data, such as social network analysis and relational learning, many other applications often involve a critical step in converting data vectors to an adjacency graph. In particular, a sparse subgraph extracted from the original graph is often required due to both theoretic and practical needs. Previous study clearly shows that the performance of different learning algorithms, e.g., clustering and classification, benefits from such sparse subgraphs with balanced node connectivity. However, the existing graph construction methods are either computationally expensive or with unsatisfactory performance. In this paper, we utilize a scalable method called auction algorithm and its parallel extension to recover a sparse yet nearly balanced subgraph with significantly reduced computational cost. Empirical study and comparison with the stateof-art approaches clearly demonstrate the superiority of the proposed method in both efficiency and accuracy.