Goto

Collaborating Authors

 sparse latent variable


Learning Restricted Boltzmann Machines with Sparse Latent Variables

Neural Information Processing Systems

Restricted Boltzmann Machines (RBMs) are a common family of undirected graphical models with latent variables. An RBM is described by a bipartite graph, with all observed variables in one layer and all latent variables in the other. We consider the task of learning an RBM given samples generated according to it. The best algorithms for this task currently have time complexity $\tilde{O}(n^2)$ for ferromagnetic RBMs (i.e., with attractive potentials) but $\tilde{O}(n^d)$ for general RBMs, where $n$ is the number of observed variables and $d$ is the maximum degree of a latent variable. Let the \textit{MRF neighborhood} of an observed variable be its neighborhood in the Markov Random Field of the marginal distribution of the observed variables. In this paper, we give an algorithm for learning general RBMs with time complexity $\tilde{O}(n^{2^s+1})$, where $s$ is the maximum number of latent variables connected to the MRF neighborhood of an observed variable. This is an improvement when $s < \log_2 (d-1)$, which corresponds to RBMs with sparse latent variables. Furthermore, we give a version of this learning algorithm that recovers a model with small prediction error and whose sample complexity is independent of the minimum potential in the Markov Random Field of the observed variables. This is of interest because the sample complexity of current algorithms scales with the inverse of the minimum potential, which cannot be controlled in terms of natural properties of the RBM.


Review for NeurIPS paper: Learning Restricted Boltzmann Machines with Sparse Latent Variables

Neural Information Processing Systems

Additional Feedback: I found the phrase "few latent variables" to be confusing, since this is not really what the authors mean. In particular, their bounds use "s", which is bounded by the number of latent variables connected to any variable in the Markov blanket of Xi in the marginal distribution over the visible X. This was not clear, in my view, until the formal definition (150-155). Since this style of RBM is not well studied, the practical significance of learning rates is not clear. The type of model is intuitively appealing (many models use "local" latent variables or encourage sparseness), and perhaps the work would spur application of these styles of RBM, but it's difficult to say that this is improving the theory for a well-established problem of importance.


Review for NeurIPS paper: Learning Restricted Boltzmann Machines with Sparse Latent Variables

Neural Information Processing Systems

This paper presents an algorithm for provably learning RBMs when each visible node is connected to a small number of hiddens, presenting bounds that improve over previous results in a specific regime. While reviewers agree the results appear sound, the paper has done little to convince the reviewers of the significance of the regime, and reviewer requests for additional intuition were not satisfied effectively in the author response. In total, though, the work appears novel and sound, and consensus is in favor of acceptance. I would strongly encourage the authors to try to address R2's questions. I will have to look at the paper again, but my intuition was that s should bound the size of the Markov blanket, which should lead to better than O(n d) scaling, and they don't seem to have addressed this except to say it doesn't.")


Learning Restricted Boltzmann Machines with Sparse Latent Variables

Neural Information Processing Systems

Restricted Boltzmann Machines (RBMs) are a common family of undirected graphical models with latent variables. An RBM is described by a bipartite graph, with all observed variables in one layer and all latent variables in the other. We consider the task of learning an RBM given samples generated according to it. The best algorithms for this task currently have time complexity \tilde{O}(n 2) for ferromagnetic RBMs (i.e., with attractive potentials) but \tilde{O}(n d) for general RBMs, where n is the number of observed variables and d is the maximum degree of a latent variable. Let the \textit{MRF neighborhood} of an observed variable be its neighborhood in the Markov Random Field of the marginal distribution of the observed variables.