Goto

Collaborating Authors

 crbm


Learning to Align from Scratch Gary B. Huang 1 Honglak Lee

Neural Information Processing Systems

Unsupervised joint alignment of images has been demonstrated to improve performance on recognition tasks such as face verification. Such alignment reduces undesired variability due to factors such as pose, while only requiring weak supervision in the form of poorly aligned examples. However, prior work on unsupervised alignment of complex, real-world images has required the careful selection of feature representation based on hand-crafted image descriptors, in order to achieve an appropriate, smooth optimization landscape. In this paper, we instead propose a novel combination of unsupervised joint alignment with unsupervised feature learning. Specifically, we incorporate deep learning into the congealing alignment framework.


Benchmarking and Enhancing Disentanglement in Concept-Residual Models

Zabounidis, Renos, Oguntola, Ini, Zhao, Konghao, Campbell, Joseph, Stepputtis, Simon, Sycara, Katia

arXiv.org Artificial Intelligence

Concept bottleneck models (CBMs) are interpretable models that first predict a set of semantically meaningful features, i.e., concepts, from observations that are subsequently used to condition a downstream task. However, the model's performance strongly depends on the engineered features and can severely suffer from incomplete sets of concepts. Prior works have proposed a side channel -- a residual -- that allows for unconstrained information flow to the downstream task, thus improving model performance but simultaneously introducing information leakage, which is undesirable for interpretability. This work proposes three novel approaches to mitigate information leakage by disentangling concepts and residuals, investigating the critical balance between model performance and interpretability. Through extensive empirical analysis on the CUB, OAI, and CIFAR 100 datasets, we assess the performance of each disentanglement method and provide insights into when they work best. Further, we show how each method impacts the ability to intervene over the concepts and their subsequent impact on task performance.


Neural Boltzmann Machines

Lang, Alex H., Loukianov, Anton D., Fisher, Charles K.

arXiv.org Artificial Intelligence

Conditional generative models are capable of using contextual information as input to create new imaginative outputs. Conditional Restricted Boltzmann Machines (CRBMs) are one class of conditional generative models that have proven to be especially adept at modeling noisy discrete or continuous data, but the lack of expressivity in CRBMs have limited their widespread adoption. Here we introduce Neural Boltzmann Machines (NBMs) which generalize CRBMs by converting each of the CRBM parameters to their own neural networks that are allowed to be functions of the conditional inputs. NBMs are highly flexible conditional generative models that can be trained via stochastic gradient descent to approximately maximize the log-likelihood of the data. We demonstrate the utility of NBMs especially with normally distributed data which has historically caused problems for Gaussian-Bernoulli CRBMs.


Minimising Contrastive Divergence in Noisy, Mixed-mode VLSI Neurons

Neural Information Processing Systems

This paper presents VLSI circuits with continuous-valued proba- bilistic behaviour realized by injecting noise into each computing unit(neuron). Interconnecting the noisy neurons forms a Contin- uous Restricted Boltzmann Machine (CRBM), which has shown promising performance in modelling and classifying noisy biomed- ical data.


A probabilistic autoencoder for causal discovery

Feiler, Matthias

arXiv.org Artificial Intelligence

The paper addresses the problem of finding the causal direction between two associated variables. The proposed solution is to build an autoencoder of their joint distribution and to maximize its estimation capacity relative to both the marginal distributions. It is shown that the resulting two capacities cannot, in general, be equal. This leads to a new criterion for causal discovery: the higher capacity is consistent with the unconstrained choice of a distribution representing the cause while the lower capacity reflects the constraints imposed by the mechanism on the distribution of the effect. Estimation capacity is defined as the ability of the auto-encoder to represent arbitrary datasets. A regularization term forces it to decide which one of the variables to model in a more generic way i.e., while maintaining higher model capacity. The causal direction is revealed by the constraints encountered while encoding the data instead of being measured as a property of the data itself. The idea is implemented and tested using a restricted Boltzmann machine.


Generating Digital Twins with Multiple Sclerosis Using Probabilistic Neural Networks

Walsh, Jonathan R., Smith, Aaron M., Pouliot, Yannick, Li-Bland, David, Loukianov, Anton, Fisher, Charles K.

arXiv.org Machine Learning

Multiple Sclerosis (MS) is a neurodegenerative disorder characterized by a complex set of clinical assessments. We use an unsupervised machine learning model called a Conditional Restricted Boltzmann Machine (CRBM) to learn the relationships between covariates commonly used to characterize subjects and their disease progression in MS clinical trials. A CRBM is capable of generating digital twins, which are simulated subjects having the same baseline data as actual subjects. Digital twins allow for subject-level statistical analyses of disease progression. The CRBM is trained using data from 2395 subjects enrolled in the placebo arms of clinical trials across the three primary subtypes of MS. We discuss how CRBMs are trained and show that digital twins generated by the model are statistically indistinguishable from their actual subject counterparts along a number of measures.


Detecting Zero-day Controller Hijacking Attacks on the Power-Grid with Enhanced Deep Learning

He, Zecheng, Raghavan, Aswin, Chai, Sek, Lee, Ruby

arXiv.org Artificial Intelligence

Attacks against the control processor of a power-grid system, especially zero-day attacks, can be catastrophic. Earlier detection of the attacks can prevent further damage. However, detecting zero-day attacks can be challenging because they have no known code and have unknown behavior. In order to address the zero-day attack problem, we propose a data-driven defense by training a temporal deep learning model, using only normal data from legitimate processes that run daily in these power-grid systems, to model the normal behavior of the power-grid controller. Then, we can quickly find malicious codes running on the processor, by estimating deviations from the normal behavior with a statistical test. Experimental results on a real power-grid controller show that we can detect anomalous behavior with over 99.9% accuracy and nearly zero false positives.


Complex-Valued Restricted Boltzmann Machine for Direct Speech Parameterization from Complex Spectra

Nakashika, Toru, Takaki, Shinji, Yamagishi, Junichi

arXiv.org Machine Learning

This paper describes a novel energy-based probabilistic distribution that represents complex-valued data and explains how to apply it to direct feature extraction from complex-valued spectra. The proposed model, the complex-valued restricted Boltzmann machine (CRBM), is designed to deal with complex-valued visible units as an extension of the well-known restricted Boltzmann machine (RBM). Like the RBM, the CRBM learns the relationships between visible and hidden units without having connections between units in the same layer, which dramatically improves training efficiency by using Gibbs sampling or contrastive divergence (CD). Another important characteristic is that the CRBM also has connections between real and imaginary parts of each of the complex-valued visible units that help represent the data distribution in the complex domain. In speech signal processing, classification and generation features are often based on amplitude spectra (e.g., MFCC, cepstra, and mel-cepstra) even if they are calculated from complex spectra, and they ignore phase information. In contrast, the proposed feature extractor using the CRBM directly encodes the complex spectra (or another complex-valued representation of the complex spectra) into binary-valued latent features (hidden units). Since the visible-hidden connections are undirected, we can also recover (decode) the complex spectra from the latent features directly. Our speech coding experiments demonstrated that the CRBM outperformed other speech coding methods, such as methods using the conventional RBM, the mel-log spectrum approximate (MLSA) decoder, etc.


Belief Propagation in Conditional RBMs for Structured Prediction

Ping, Wei, Ihler, Alexander

arXiv.org Machine Learning

Restricted Boltzmann machines~(RBMs) and conditional RBMs~(CRBMs) are popular models for a wide range of applications. In previous work, learning on such models has been dominated by contrastive divergence~(CD) and its variants. Belief propagation~(BP) algorithms are believed to be slow for structured prediction on conditional RBMs~(e.g., Mnih et al. [2011]), and not as good as CD when applied in learning~(e.g., Larochelle et al. [2012]). In this work, we present a matrix-based implementation of belief propagation algorithms on CRBMs, which is easily scalable to tens of thousands of visible and hidden units. We demonstrate that, in both maximum likelihood and max-margin learning, training conditional RBMs with BP as the inference routine can provide significantly better results than current state-of-the-art CD methods on structured prediction problems. We also include practical guidelines on training CRBMs with BP, and some insights on the interaction of learning and inference algorithms for CRBMs.


Geometry and Expressive Power of Conditional Restricted Boltzmann Machines

Montufar, Guido, Ay, Nihat, Ghazi-Zahedi, Keyan

arXiv.org Machine Learning

Conditional restricted Boltzmann machines are undirected stochastic neural networks with a layer of input and output units connected bipartitely to a layer of hidden units. These networks define models of conditional probability distributions on the states of the output units given the states of the input units, parametrized by interaction weights and biases. We address the representational power of these models, proving results their ability to represent conditional Markov random fields and conditional distributions with restricted supports, the minimal size of universal approximators, the maximal model approximation errors, and on the dimension of the set of representable conditional distributions. We contribute new tools for investigating conditional probability models, which allow us to improve the results that can be derived from existing work on restricted Boltzmann machine probability models.