Discrete Restricted Boltzmann Machines

arXiv.org Machine Learning

We describe discrete restricted Boltzmann machines: probabilistic graphical models with bipartite interactions between visible and hidden discrete variables. Examples are binary restricted Boltzmann machines and discrete naive Bayes models. We detail the inference functions and distributed representations arising in these models in terms of configurations of projected products of simplices and normal fans of products of simplices. We bound the number of hidden variables, depending on the cardinalities of their state spaces, for which these models can approximate any probability distribution on their visible states to any given accuracy. In addition, we use algebraic methods and coding theory to compute their dimension.


Expressive Power and Approximation Errors of Restricted Boltzmann Machines

arXiv.org Machine Learning

We present explicit classes of probability distributions that can be learned by Restricted Boltzmann Machines (RBMs) depending on the number of units that they contain, and which are representative for the expressive power of the model. We use this to show that the maximal Kullback-Leibler divergence to the RBM model with $n$ visible and $m$ hidden units is bounded from above by $n - \left\lfloor \log(m+1) \right\rfloor - \frac{m+1}{2^{\left\lfloor\log(m+1)\right\rfloor}} \approx (n -1) - \log(m+1)$. In this way we can specify the number of hidden units that guarantees a sufficiently rich model containing different classes of distributions and respecting a given error tolerance.


Dimension of Marginals of Kronecker Product Models

arXiv.org Machine Learning

A Kronecker product model is the set of visible marginal probability distributions of an exponential family whose sufficient statistics matrix factorizes as a Kronecker product of two matrices, one for the visible variables and one for the hidden variables. We estimate the dimension of these models by the maximum rank of the Jacobian in the limit of large parameters. The limit is described by the tropical morphism; a piecewise linear map with pieces corresponding to slicings of the visible matrix by the normal fan of the hidden matrix. We obtain combinatorial conditions under which the model has the expected dimension, equal to the minimum of the number of natural parameters and the dimension of the ambient probability simplex. Additionally, we prove that the binary restricted Boltzmann machine always has the expected dimension.


Universal Approximation Depth and Errors of Narrow Belief Networks with Discrete Units

arXiv.org Machine Learning

We generalize recent theoretical work on the minimal number of layers of narrow deep belief networks that can approximate any probability distribution on the states of their visible units arbitrarily well. We relax the setting of binary units (Sutskever and Hinton, 2008; Le Roux and Bengio, 2008, 2010; Mont\'ufar and Ay, 2011) to units with arbitrary finite state spaces, and the vanishing approximation error to an arbitrary approximation error tolerance. For example, we show that a $q$-ary deep belief network with $L\geq 2+\frac{q^{\lceil m-\delta \rceil}-1}{q-1}$ layers of width $n \leq m + \log_q(m) + 1$ for some $m\in \mathbb{N}$ can approximate any probability distribution on $\{0,1,\ldots,q-1\}^n$ without exceeding a Kullback-Leibler divergence of $\delta$. Our analysis covers discrete restricted Boltzmann machines and na\"ive Bayes models as special cases.


Expressive Power and Approximation Errors of Restricted Boltzmann Machines

Neural Information Processing Systems

We present explicit classes of probability distributions that can be learned by Restricted Boltzmann Machines (RBMs) depending on the number of units that they contain, and which are representative for the expressive power of the model. We use this to show that the maximal Kullback-Leibler divergence to the RBM model with n visible and m hidden units is bounded from above by (n-1)-log(m+1). In this way we can specify the number of hidden units that guarantees a sufficiently rich model containing different classes of distributions and respecting a given error tolerance.