### Dimension of Marginals of Kronecker Product Models

A Kronecker product model is the set of visible marginal probability distributions of an exponential family whose sufficient statistics matrix factorizes as a Kronecker product of two matrices, one for the visible variables and one for the hidden variables. We estimate the dimension of these models by the maximum rank of the Jacobian in the limit of large parameters. The limit is described by the tropical morphism; a piecewise linear map with pieces corresponding to slicings of the visible matrix by the normal fan of the hidden matrix. We obtain combinatorial conditions under which the model has the expected dimension, equal to the minimum of the number of natural parameters and the dimension of the ambient probability simplex. Additionally, we prove that the binary restricted Boltzmann machine always has the expected dimension.

### Geometry and Expressive Power of Conditional Restricted Boltzmann Machines

Conditional restricted Boltzmann machines are undirected stochastic neural networks with a layer of input and output units connected bipartitely to a layer of hidden units. These networks define models of conditional probability distributions on the states of the output units given the states of the input units, parametrized by interaction weights and biases. We address the representational power of these models, proving results their ability to represent conditional Markov random fields and conditional distributions with restricted supports, the minimal size of universal approximators, the maximal model approximation errors, and on the dimension of the set of representable conditional distributions. We contribute new tools for investigating conditional probability models, which allow us to improve the results that can be derived from existing work on restricted Boltzmann machine probability models.

### Restricted Boltzmann Machines: Introduction and Review

The restricted Boltzmann machine is a network of stochastic units with undirected interactions between pairs of visible and hidden units. This model was popularized as a building block of deep learning architectures and has continued to play an important role in applied and theoretical machine learning. Restricted Boltzmann machines carry a rich structure, with connections to geometry, applied algebra, probability, statistics, machine learning, and other areas. The analysis of these models is attractive in its own right and also as a platform to combine and generalize mathematical tools for graphical models with hidden variables. This article gives an introduction to the mathematical analysis of restricted Boltzmann machines, reviews recent results on the geometry of the sets of probability distributions representable by these models, and suggests a few directions for further investigation.

### Expressive Power and Approximation Errors of Restricted Boltzmann Machines

We present explicit classes of probability distributions that can be learned by Restricted Boltzmann Machines (RBMs) depending on the number of units that they contain, and which are representative for the expressive power of the model. We use this to show that the maximal Kullback-Leibler divergence to the RBM model with $n$ visible and $m$ hidden units is bounded from above by $n - \left\lfloor \log(m+1) \right\rfloor - \frac{m+1}{2^{\left\lfloor\log(m+1)\right\rfloor}} \approx (n -1) - \log(m+1)$. In this way we can specify the number of hidden units that guarantees a sufficiently rich model containing different classes of distributions and respecting a given error tolerance.

### Expressive Power and Approximation Errors of Restricted Boltzmann Machines

We present explicit classes of probability distributions that can be learned by Restricted Boltzmann Machines (RBMs) depending on the number of units that they contain, and which are representative for the expressive power of the model. We use this to show that the maximal Kullback-Leibler divergence to the RBM model with n visible and m hidden units is bounded from above by (n-1)-log(m+1). In this way we can specify the number of hidden units that guarantees a sufficiently rich model containing different classes of distributions and respecting a given error tolerance.