Directed Networks
HINT: Hierarchical Invertible Neural Transport for General and Sequential Bayesian inference
Detommaso, Gianluca, Kruse, Jakob, Ardizzone, Lynton, Rother, Carsten, Köthe, Ullrich, Scheichl, Robert
In this paper, we introduce Hierarchical Invertible Neural Transport (HINT), an algorithm that merges Invertible Neural Networks and optimal transport to sample from a posterior distribution in a Bayesian framework. This method exploits a hierarchical architecture to construct a Knothe-Rosenblatt transport map between an arbitrary density and the joint density of hidden variables and observations. After training the map, samples from the posterior can be immediately recovered for any contingent observation. Any underlying model evaluation can be performed fully offline from training without the need of a model-gradient. Furthermore, no analytical evaluation of the prior is necessary, which makes HINT an ideal candidate for sequential Bayesian inference. We demonstrate the efficacy of HINT on two numerical experiments.
Self-supervised audio representation learning for mobile devices
Tagliasacchi, Marco, Gfeller, Beat, Quitry, Félix de Chaumont, Roblek, Dominik
We explore self-supervised models that can be potentially deployed on mobile devices to learn general purpose audio representations. Specifically, we propose methods that exploit the temporal context in the spectrogram domain. One method estimates the temporal gap between two short audio segments extracted at random from the same audio clip. The other methods are inspired by Word2Vec, a popular technique used to learn word embeddings, and aim at reconstructing a temporal spectrogram slice from past and future slices or, alternatively, at reconstructing the context of surrounding slices from the current slice. We focus our evaluation on small encoder architectures, which can be potentially run on mobile devices during both inference (re-using a common learned representation across multiple downstream tasks) and training (capturing the true data distribution without compromising users' privacy when combined with federated learning). We evaluate the quality of the embeddings produced by the self-supervised learning models, and show that they can be re-used for a variety of downstream tasks, and for some tasks even approach the performance of fully supervised models of similar size.
Active embedding search via noisy paired comparisons
Canal, Gregory H., Massimino, Andrew K., Davenport, Mark A., Rozell, Christopher J.
Suppose that we wish to estimate a user's preference vector $w$ from paired comparisons of the form "does user $w$ prefer item $p$ or item $q$?," where both the user and items are embedded in a low-dimensional Euclidean space with distances that reflect user and item similarities. Such observations arise in numerous settings, including psychometrics and psychology experiments, search tasks, advertising, and recommender systems. In such tasks, queries can be extremely costly and subject to varying levels of response noise; thus, we aim to actively choose pairs that are most informative given the results of previous comparisons. We provide new theoretical insights into the benefits and challenges of greedy information maximization in this setting, and develop two novel strategies that maximize lower bounds on information gain and are simpler to analyze and compute respectively. We use simulated responses from a real-world dataset to validate our strategies through their similar performance to greedy information maximization, and their superior preference estimation over state-of-the-art selection methods as well as random queries.
Sparse Gaussian Process Modulated Hawkes Process
Zhang, Rui, Walder, Christian, Rizoiu, Marian-Andrei
The Hawkes process has been widely applied to modeling self-exciting events, including neuron spikes, earthquakes and tweets. To avoid designing parametric kernel functions and to be able to quantify the prediction confidence, non-parametric Bayesian Hawkes processes have been proposed. However the inference of such models suffers from unscalability or slow convergence. In this paper, we first propose a new non-parametric Bayesian Hawkes process whose triggering kernel is modeled as a squared sparse Gaussian process. Second, we present the variational inference scheme for the model optimization, which has the advantage of linear time complexity by leveraging the stationarity of the triggering kernel. Third, we contribute a tighter lower bound than the evidence lower bound of the marginal likelihood for the model selection. Finally, we exploit synthetic data and large-scale social media data to validate the efficiency of our method and the practical utility of our approximate marginal likelihood. We show that our approach outperforms state-of-the-art non-parametric Bayesian and non-Bayesian methods.
Decentralized Bayesian Learning over Graphs
Lalitha, Anusha, Wang, Xinghan, Kilinc, Osman, Lu, Yongxi, Javidi, Tara, Koushanfar, Farinaz
We propose a decentralized learning algorithm over a general social network. The algorithm leaves the training data distributed on the mobile devices while utilizing a peer to peer model aggregation method. The proposed algorithm allows agents with local data to learn a shared model explaining the global training data in a decentralized fashion. The proposed algorithm can be viewed as a Bayesian and peer-to-peer variant of federated learning in which each agent keeps a "posterior probability distribution" over a global model parameters. The agent update its "posterior" based on 1) the local training data and 2) the asynchronous communication and model aggregation with their 1-hop neighbors. This Bayesian formulation allows for a systematic treatment of model aggregation over any arbitrary connected graph. Furthermore, it provides strong analytic guarantees on converge in the realizable case as well as a closed form characterization of the rate of convergence. We also show that our methodology can be combined with efficient Bayesian inference techniques to train Bayesian neural networks in a decentralized manner. By empirical studies we show that our theoretical analysis can guide the design of network/social interactions and data partitioning to achieve convergence.
Bayesian Tensorized Neural Networks with Automatic Rank Selection
Tensor decomposition is an effective approach to compress over-parameterized neural networks and to enable their deployment on resource-constrained hardware platforms. However, directly applying tensor compression in the training process is a challenging task due to the difficulty of choosing a proper tensor rank. In order to achieve this goal, this paper proposes a Bayesian tensorized neural network. Our Bayesian method performs automatic model compression via an adaptive tensor rank determination. We also present approaches for posterior density calculation and maximum a posteriori (MAP) estimation for the end-to-end training of our tensorized neural network. We provide experimental validation on a fully connected neural network, a CNN and a residual neural network where our work produces $7.4\times$ to $137\times$ more compact neural networks directly from the training.
Triple-to-Text: Converting RDF Triples into High-Quality Natural Languages via Optimizing an Inverse KL Divergence
Zhu, Yaoming, Wan, Juncheng, Zhou, Zhiming, Chen, Liheng, Qiu, Lin, Zhang, Weinan, Jiang, Xin, Yu, Yong
Knowledge base is one of the main forms to represent information in a structured way. A knowledge base typically consists of Resource Description Frameworks (RDF) triples which describe the entities and their relations. Generating natural language description of the knowledge base is an important task in NLP, which has been formulated as a conditional language generation task and tackled using the sequence-to-sequence framework. Current works mostly train the language models by maximum likelihood estimation, which tends to generate lousy sentences. In this paper, we argue that such a problem of maximum likelihood estimation is intrinsic, which is generally irrevocable via changing network structures. Accordingly, we propose a novel Triple-to-Text (T2T) framework, which approximately optimizes the inverse Kullback-Leibler (KL) divergence between the distributions of the real and generated sentences. Due to the nature that inverse KL imposes large penalty on fake-looking samples, the proposed method can significantly reduce the probability of generating low-quality sentences. Our experiments on three real-world datasets demonstrate that T2T can generate higher-quality sentences and outperform baseline models in several evaluation metrics.
Online Learning Made Simple - Anytime, Anywhere Simpliv
Artificial Intelligence has come a long way from being the stuff of science fiction movies and books to becoming an integral part of our daily lives. Today, AI is one of the fastest growing global industries. Investments and experiments in AI have been taking place all around the world. Given its unimaginably wide range of uses; AI is a field of expertise that is set to grow in a very huge way over the coming years. AI professionals are among the highest paid in the field of IT. Ans: Artificial Intelligence is a part of computer science that aims to create machine that are intelligent and seek to work and react the way humans do. Q2)What to you understand by an artificial intelligence Neural Network?
Accelerating Langevin Sampling with Birth-death
Lu, Yulong, Lu, Jianfeng, Nolen, James
A fundamental problem in Bayesian inference and statistical machine learning is to efficiently sample from multimodal distributions. Due to metastability, multimodal distributions are difficult to sample using standard Markov chain Monte Carlo methods. We propose a new sampling algorithm based on a birth-death mechanism to accelerate the mixing of Langevin diffusion. Our algorithm is motivated by its mean field partial differential equation (PDE), which is a Fokker-Planck equation supplemented by a nonlocal birth-death term. This PDE can be viewed as a gradient flow of the Kullback-Leibler divergence with respect to the Wasserstein-Fisher-Rao metric. We prove that under some assumptions the asymptotic convergence rate of the nonlocal PDE is independent of the potential barrier, in contrast to the exponential dependence in the case of the Langevin diffusion. We illustrate the efficiency of the birth-death accelerated Langevin method through several analytical examples and numerical experiments.
Tissue segmentation with deep 3D networks and spatial priors
Hirsch, Lukas, Huang, Yu, Parra, Lucas C
Conventional automated segmentation of the human head distinguishes different tissues based on image intensities in an MRI volume and prior tissue probability maps (TPM). This works well for normal head anatomies, but fails in the presence of unexpected lesions. Deep convolutional neural networks leverage instead volumetric spatial patterns and can be trained to segment lesions, but have thus far not integrated prior probabilities. Here we add to a three-dimensional convolutional network spatial priors with a TPM, morphological priors with conditional random fields, and context with a wider field-of-view at lower resolution. The new architecture, which we call MultiPrior, was designed to be a fully-trainable, three-dimensional convolutional network. Thus, the resulting architecture represents a neural network with learnable spatial memories. When trained on a set of stroke patients and healthy subjects, MultiPrior outperforms the state-of-the-art segmentation tools such as DeepMedic and SPM segmentation. The approach is further demonstrated on patients with disorders of consciousness, where we find that cognitive state correlates positively with gray-matter volumes and negatively with the extent of ventricles. We make the code and trained networks freely available to support future clinical research projects.