Genre
Bayesian inference for logistic models using Polya-Gamma latent variables
Polson, Nicholas G., Scott, James G., Windle, Jesse
We propose a new data-augmentation strategy for fully Bayesian inference in models with binomial likelihoods. The approach appeals to a new class of Polya-Gamma distributions, which are constructed in detail. A variety of examples are presented to show the versatility of the method, including logistic regression, negative binomial regression, nonlinear mixed-effects models, and spatial models for count data. In each case, our data-augmentation strategy leads to simple, effective methods for posterior inference that: (1) circumvent the need for analytic approximations, numerical integration, or Metropolis-Hastings; and (2) outperform other known data-augmentation strategies, both in ease of use and in computational efficiency. All methods, including an efficient sampler for the Polya-Gamma distribution, are implemented in the R package BayesLogit. In the technical supplement appended to the end of the paper, we provide further details regarding the generation of Polya-Gamma random variables; the empirical benchmarks reported in the main manuscript; and the extension of the basic data-augmentation framework to contingency tables and multinomial outcomes.
Performance comparison of State-of-the-art Missing Value Imputation Algorithms on Some Bench mark Datasets
The presence of missing values influences the selection of appropriate set of attributes that render degradation in classification accuracies of the classifiers. Missing values are a common problem in almost all real world data sets [1] used in knowledge discovery and data mining(KDD) applications. Specifically they are more frequent in clinical databases [2, 3, 4] and temporal climate databases [5, 6]. Their presence would greatly affect the performance of classifiers [7]. The missing values in the databases may arise due various reasons such as the value being lost (erased or deleted) or not recorded, incorrect measurements, equipment errors, or possibly due to an expert not attaching any importance to a particular procedure. The incomplete data can be identified by looking for null values in the data set. However, this is not always true, since missing values can appear in the form of outliers or even wrong data (i.e.
How to minimize the energy consumption in mobile ad-hoc networks
In this work we are interested in the problem of energy management in Mobile Ad-hoc Network (MANET). The solving and optimization of MANET allow assisting the users to efficiently use their devices in order to minimize the batteries power consumption. In this framework, we propose a modelling of the MANET in form of a Constraint Optimization Problem called COMANET. Then, in the objective to minimize the consumption of batteries power, we present an approach based on an adaptation of the Dijkstra's algorithm to the MANET problem called MANED. Finally, we expose some experimental results showing utility of this approach.
Understanding Humans' Strategies in Maze Solving
Navigating through a visual maze relies on the strategic use of eye movements to select and identify the route. When navigating the maze, there are trade-offs between exploring to the environment and relying on memory. This study examined strategies used to navigating through novel and familiar mazes that were viewed from above and traversed by a mouse cursor. Eye and mouse movements revealed two modes that almost never occurred concurrently: exploration and guidance. Analyses showed that people learned mazes and were able to devise and carry out complex, multi-faceted strategies that traded-off visual exploration against active motor performance. These strategies took into account available visual information, memory, confidence, the estimated cost in time for exploration, and idiosyncratic tolerance for error. Understanding the strategies humans used for maze solving is valuable for applications in cognitive neuroscience as well as in AI, robotics and human-robot interactions.
An Information Theoretic Measure of Judea Pearl's Identifiability and Causal Influence
In this paper, we define a new information theoretic measure that we call the "uprooted information". We show that a necessary and sufficient condition for a probability $P(s|do(t))$ to be "identifiable" (in the sense of Pearl) in a graph $G$ is that its uprooted information be non-negative for all models of the graph $G$. In this paper, we also give a new algorithm for deciding, for a Bayesian net that is semi-Markovian, whether a probability $P(s|do(t))$ is identifiable, and, if it is identifiable, for expressing it without allusions to confounding variables. Our algorithm is closely based on a previous algorithm by Tian and Pearl, but seems to correct a small flaw in theirs. In this paper, we also find a {\it necessary and sufficient graphical condition} for a probability $P(s|do(t))$ to be identifiable when $t$ is a singleton set. So far, in the prior literature, it appears that only a {\it sufficient graphical condition} has been given for this. By "graphical" we mean that it is directly based on Judea Pearl's 3 rules of do-calculus.
A Massively Parallel Associative Memory Based on Sparse Neural Networks
Yao, Zhe, Gripon, Vincent, Rabbat, Michael G.
Associative memories store content in such a way that the content can be later retrieved by presenting the memory with a small portion of the content, rather than presenting the memory with an address as in more traditional memories. Associative memories are used as building blocks for algorithms within database engines, anomaly detection systems, compression algorithms, and face recognition systems. A classical example of an associative memory is the Hopfield neural network. Recently, Gripon and Berrou have introduced an alternative construction which builds on ideas from the theory of error correcting codes and which greatly outperforms the Hopfield network in capacity, diversity, and efficiency. In this paper we implement a variation of the Gripon-Berrou associative memory on a general purpose graphical processing unit (GPU). The work of Gripon and Berrou proposes two retrieval rules, sum-of-sum and sum-of-max. The sum-of-sum rule uses only matrix-vector multiplication and is easily implemented on the GPU. The sum-of-max rule is much less straightforward to implement because it involves non-linear operations. However, the sum-of-max rule gives significantly better retrieval error rates. We propose a hybrid rule tailored for implementation on a GPU which achieves a 880-fold speedup without sacrificing any accuracy.
Dictionary LASSO: Guaranteed Sparse Recovery under Linear Transformation
Liu, Ji, Yuan, Lei, Ye, Jieping
We consider the following signal recovery problem: given a measurement matrix $\Phi\in \mathbb{R}^{n\times p}$ and a noisy observation vector $c\in \mathbb{R}^{n}$ constructed from $c = \Phi\theta^* + \epsilon$ where $\epsilon\in \mathbb{R}^{n}$ is the noise vector whose entries follow i.i.d. centered sub-Gaussian distribution, how to recover the signal $\theta^*$ if $D\theta^*$ is sparse {\rca under a linear transformation} $D\in\mathbb{R}^{m\times p}$? One natural method using convex optimization is to solve the following problem: $$\min_{\theta} {1\over 2}\|\Phi\theta - c\|^2 + \lambda\|D\theta\|_1.$$ This paper provides an upper bound of the estimate error and shows the consistency property of this method by assuming that the design matrix $\Phi$ is a Gaussian random matrix. Specifically, we show 1) in the noiseless case, if the condition number of $D$ is bounded and the measurement number $n\geq \Omega(s\log(p))$ where $s$ is the sparsity number, then the true solution can be recovered with high probability; and 2) in the noisy case, if the condition number of $D$ is bounded and the measurement increases faster than $s\log(p)$, that is, $s\log(p)=o(n)$, the estimate error converges to zero with probability 1 when $p$ and $s$ go to infinity. Our results are consistent with those for the special case $D=\bold{I}_{p\times p}$ (equivalently LASSO) and improve the existing analysis. The condition number of $D$ plays a critical role in our analysis. We consider the condition numbers in two cases including the fused LASSO and the random graph: the condition number in the fused LASSO case is bounded by a constant, while the condition number in the random graph case is bounded with high probability if $m\over p$ (i.e., $#text{edge}\over #text{vertex}$) is larger than a certain constant. Numerical simulations are consistent with our theoretical results.
On GROUSE and Incremental SVD
Balzano, Laura, Wright, Stephen J.
GROUSE (Grassmannian Rank-One Update Subspace Estimation) is an incremental algorithm for identifying a subspace of Rn from a sequence of vectors in this subspace, where only a subset of components of each vector is revealed at each iteration. Recent analysis has shown that GROUSE converges locally at an expected linear rate, under certain assumptions. GROUSE has a similar flavor to the incremental singular value decomposition algorithm, which updates the SVD of a matrix following addition of a single column. In this paper, we modify the incremental SVD approach to handle missing data, and demonstrate that this modified approach is equivalent to GROUSE, for a certain choice of an algorithmic parameter.
Sparse Factor Analysis for Learning and Content Analytics
Lan, Andrew S., Waters, Andrew E., Studer, Christoph, Baraniuk, Richard G.
We develop a new model and algorithms for machine learning-based learning analytics, which estimate a learner's knowledge of the concepts underlying a domain, and content analytics, which estimate the relationships among a collection of questions and those concepts. Our model represents the probability that a learner provides the correct response to a question in terms of three factors: their understanding of a set of underlying concepts, the concepts involved in each question, and each question's intrinsic difficulty. We estimate these factors given the graded responses to a collection of questions. The underlying estimation problem is ill-posed in general, especially when only a subset of the questions are answered. The key observation that enables a well-posed solution is the fact that typical educational domains of interest involve only a small number of key concepts. Leveraging this observation, we develop both a bi-convex maximum-likelihood and a Bayesian solution to the resulting SPARse Factor Analysis (SPARFA) problem. We also incorporate user-defined tags on questions to facilitate the interpretability of the estimated factors. Experiments with synthetic and real-world data demonstrate the efficacy of our approach. Finally, we make a connection between SPARFA and noisy, binary-valued (1-bit) dictionary learning that is of independent interest.
The Cluster Graphical Lasso for improved estimation of Gaussian graphical models
Tan, Kean Ming, Witten, Daniela, Shojaie, Ali
We consider the task of estimating a Gaussian graphical model in the high-dimensional setting. The graphical lasso, which involves maximizing the Gaussian log likelihood subject to an l1 penalty, is a well-studied approach for this task. We begin by introducing a surprising connection between the graphical lasso and hierarchical clustering: the graphical lasso in effect performs a two-step procedure, in which (1) single linkage hierarchical clustering is performed on the variables in order to identify connected components, and then (2) an l1-penalized log likelihood is maximized on the subset of variables within each connected component. In other words, the graphical lasso determines the connected components of the estimated network via single linkage clustering. Unfortunately, single linkage clustering is known to perform poorly in certain settings. Therefore, we propose the cluster graphical lasso, which involves clustering the features using an alternative to single linkage clustering, and then performing the graphical lasso on the subset of variables within each cluster. We establish model selection consistency for this technique, and demonstrate its improved performance relative to the graphical lasso in a simulation study, as well as in applications to an equities data set, a university webpage data set, and a gene expression data set.