Desjardins, Guillaume, Bengio, Yoshua, Courville, Aaron C.

Markov Random Fields (MRFs) have proven very powerful both as density estimators and feature extractors for classification. However, their use is often limited by an inability to estimate the partition function $Z$. In this paper, we exploit the gradient descent training procedure of restricted Boltzmann machines (a type of MRF) to {\bf track} the log partition function during learning. Our method relies on two distinct sources of information: (1) estimating the change $\Delta Z$ incurred by each gradient update, (2) estimating the difference in $Z$ over a small set of tempered distributions using bridge sampling. The two sources of information are then combined using an inference procedure similar to Kalman filtering. Learning MRFs through Tempered Stochastic Maximum Likelihood, we can estimate $Z$ using no more temperatures than are required for learning. Comparing to both exact values and estimates using annealed importance sampling (AIS), we show on several datasets that our method is able to accurately track the log partition function. In contrast to AIS, our method provides this estimate at each time-step, at a computational cost similar to that required for training alone.

Tran, Truyen, Phung, Dinh, Venkatesh, Svetha, Bui, Hung H.

Deep architecture such as hierarchical semi-Markov models is an important class of models for nested sequential data. Current exact inference schemes either cost cubic time in sequence length, or exponential time in model depth. These costs are prohibitive for large-scale problems with arbitrary length and depth. In this contribution, we propose a new approximation technique that may have the potential to achieve sub-cubic time complexity in length and linear time depth, at the cost of some loss of quality. The idea is based on two well-known methods: Gibbs sampling and Rao-Blackwellisation. We provide some simulation-based evaluation of the quality of the RGBS with respect to run time and sequence length.

Montufar, Guido, Morton, Jason

We describe discrete restricted Boltzmann machines: probabilistic graphical models with bipartite interactions between visible and hidden discrete variables. Examples are binary restricted Boltzmann machines and discrete naive Bayes models. We detail the inference functions and distributed representations arising in these models in terms of configurations of projected products of simplices and normal fans of products of simplices. We bound the number of hidden variables, depending on the cardinalities of their state spaces, for which these models can approximate any probability distribution on their visible states to any given accuracy. In addition, we use algebraic methods and coding theory to compute their dimension.

Artificial intelligence is enabling many scientific breakthroughs, especially in fields of study that generate high volumes of complex data such as neuroscience. As impossible as it may seem, neuroscientists are making strides in decoding neural activity into speech using artificial neural networks. Yesterday, the neuroscience team of Gopala K. Anumanchipalli, Josh Chartier, and Edward F. Chang of University of California San Francisco (UCSF) published in Nature their study using artificial intelligence and a state-of-the-art brain-machine interface to produce synthetic speech from brain recordings. The concept is relatively straightforward--record the brain activity and audio of participants while they are reading aloud in order to create a system that decodes brain signals for vocal tract movements, then synthesize speech from the decoded movements. The execution of the concept required sophisticated finessing of cutting-edge AI techniques and tools.

Kartik, Dhruva, Sabir, Ekraam, Mitra, Urbashi, Natarajan, Prem

Information theory has been very successful in obtaining performance limits for various problems such as communication, compression and hypothesis testing. Likewise, stochastic control theory provides a characterization of optimal policies for Partially Observable Markov Decision Processes (POMDPs) using dynamic programming. However, finding optimal policies for these problems is computationally hard in general and thus, heuristic solutions are employed in practice. Deep learning can be used as a tool for designing better heuristics in such problems. In this paper, the problem of active sequential hypothesis testing is considered. The goal is to design a policy that can reliably infer the true hypothesis using as few samples as possible by adaptively selecting appropriate queries. This problem can be modeled as a POMDP and bounds on its value function exist in literature. However, optimal policies have not been identified and various heuristics are used. In this paper, two new heuristics are proposed: one based on deep reinforcement learning and another based on a KL-divergence zero-sum game. These heuristics are compared with state-of-the-art solutions and it is demonstrated using numerical experiments that the proposed heuristics can achieve significantly better performance than existing methods in some scenarios.