Goto

Collaborating Authors

 Bayesian Learning


Optimized Realization of Bayesian Networks in Reduced Normal Form using Latent Variable Model

arXiv.org Machine Learning

Bayesian networks in their Factor Graph Reduced Normal Form (FGrn) are a powerful paradigm for implementing inference graphs. Unfortunately, the computational and memory costs of these networks may be considerable, even for relatively small networks, and this is one of the main reasons why these structures have often been underused in practice. In this work, through a detailed algorithmic and structural analysis, various solutions for cost reduction are proposed. An online version of the classic batch learning algorithm is also analyzed, showing very similar results (in an unsupervised context); which is essential even if multilevel structures are to be built. The solutions proposed, together with the possible online learning algorithm, are included in a C++ library that is quite efficient, especially if compared to the direct use of the well-known sum-product and Maximum Likelihood (ML) algorithms. The results are discussed with particular reference to a Latent Variable Model (LVM) structure.


Deep Learning Finds Fake News with 97% Accuracy

#artificialintelligence

That means the pooling layer computes a feature vector of size 128 which is passed into dense layers of the feedforward network as we mentioned above. The overall structure of the DNN can be understood as a preprocessor defined in the first part that is being trained to map text sequences into feature vectors in such a way that the weights of the second part can be trained to obtain optimal classification results from the overall network. More details on the implementation and text preprocessing can be found in my GitHub repository for this project. I trained this network for 10 epochs with a batch size of 128 using an 80-20 training/hold-out set. A couple of notes on additional parameters: The vast majority of documents in this collection is of length 5000 or less. So for the maximum input sequence length for the DNN I chose 5000 words. There are roughly 100,000 unique words in this collection of documents. I arbitrarily limited the dictionary that the DNN can learn to 25% of that: 25,000 words. Finally, for the embedding dimension, I chose 300 simply because that is the default embedding dimension for both word2vec and GloVe.


Probabilistic symmetry and invariant neural networks

arXiv.org Machine Learning

In an effort to improve the performance of deep neural networks in data-scarce, non-i.i.d., or unsupervised settings, much recent research has been devoted to encoding invariance under symmetry transformations into neural network architectures. We treat the neural network input and output as random variables, and consider group invariance from the perspective of probabilistic symmetry. Drawing on tools from probability and statistics, we establish a link between functional and probabilistic symmetry, and obtain generative functional representations of joint and conditional probability distributions that are invariant or equivariant under the action of a compact group. Those representations completely characterize the structure of neural networks that can be used to model such distributions and yield a general program for constructing invariant stochastic or deterministic neural networks. We develop the details of the general program for exchangeable sequences and arrays, recovering a number of recent examples as special cases.


Applying SVGD to Bayesian Neural Networks for Cyclical Time-Series Prediction and Inference

arXiv.org Machine Learning

A regression-based BNN model is proposed to predict spatiotemporal quantities like hourly rider demand with calibrated uncertainties. The main contributions of this paper are (i) A feed-forward deterministic neural network (DetNN) architecture that predicts cyclical time series data with sensitivity to anomalous forecasting events; (ii) A Bayesian framework applying SVGD to train large neural networks for such tasks, capable of producing time series predictions as well as measures of uncertainty surrounding the predictions. Experiments show that the proposed BNN reduces average estimation error by 10% across 8 U.S. cities compared to a fine-tuned multilayer perceptron (MLP), and 4% better than the same network architecture trained without SVGD.


Theory of Minds: Understanding Behavior in Groups Through Inverse Planning

arXiv.org Artificial Intelligence

Human social behavior is structured by relationships. We form teams, groups, tribes, and alliances at all scales of human life. These structures guide multi-agent cooperation and competition, but when we observe others these underlying relationships are typically unobservable and hence must be inferred. Humans make these inferences intuitively and flexibly, often making rapid generalizations about the latent relationships that underlie behavior from just sparse and noisy observations. Rapid and accurate inferences are important for determining who to cooperate with, who to compete with, and how to cooperate in order to compete. Towards the goal of building machine-learning algorithms with human-like social intelligence, we develop a generative model of multi-agent action understanding based on a novel representation for these latent relationships called Composable Team Hierarchies (CTH). This representation is grounded in the formalism of stochastic games and multi-agent reinforcement learning. We use CTH as a target for Bayesian inference yielding a new algorithm for understanding behavior in groups that can both infer hidden relationships as well as predict future actions for multiple agents interacting together. Our algorithm rapidly recovers an underlying causal model of how agents relate in spatial stochastic games from just a few observations. The patterns of inference made by this algorithm closely correspond with human judgments and the algorithm makes the same rapid generalizations that people do.


Implementing Naive Bayes for Sentiment Analysis in Python

#artificialintelligence

The Naive Bayes Classifier is a well known machine learning classifier with applications in Natural Language Processing (NLP) and other areas. Despite its simplicity, it is able to achieve above average performance in different tasks like sentiment analysis. Today we will elaborate on the core principles of this model and then implement it in Python. In the end, we will see how well we do on a dataset of 2000 movie reviews. The math behind this model isn't particularly difficult to understand if you are familiar with some of the math notation.


A Primer on PAC-Bayesian Learning

arXiv.org Machine Learning

Generalized Bayesian learning algorithms are increasingly popular in machine learning, due to their PAC generalization properties and flexibility. The present paper aims at providing a self-contained survey on the resulting PAC-Bayes framework and some of its main theoretical and algorithmic developments.


A review of single-source unsupervised domain adaptation

arXiv.org Machine Learning

Domain adaptation has become a prominent problem setting in machine learning and related fields. This review asks the questions: when and how a classifier can learn from a source domain and generalize to a target domain. As for when, we review conditions that allow for cross-domain generalization error bounds. As for how, we present a categorization of approaches, divided into, what we refer to as, sample-based, feature-based and inference-based methods. Sample-based methods focus on weighting individual observations during training based on their importance to the target domain. Feature-based methods focus on mapping, projecting and representing features such that a source classifier performs well on the target domain and inference-based methods focus on alternative estimators, such as robust, minimax or Bayesian. Our categorization highlights recurring ideas and raises a number of questions important to further research.


Cost Sensitive Learning in the Presence of Symmetric Label Noise

arXiv.org Machine Learning

In binary classification framework, we are interested in making cost sensitive label predictions in the presence of uniform/symmetric label noise. We first observe that $0$-$1$ Bayes classifiers are not (uniform) noise robust in cost sensitive setting. To circumvent this impossibility result, we present two schemes; unlike the existing methods, our schemes do not require noise rate. The first one uses $\alpha$-weighted $\gamma$-uneven margin squared loss function, $l_{\alpha, usq}$, which can handle cost sensitivity arising due to domain requirement (using user given $\alpha$) or class imbalance (by tuning $\gamma$) or both. However, we observe that $l_{\alpha, usq}$ Bayes classifiers are also not cost sensitive and noise robust. We show that regularized ERM of this loss function over the class of linear classifiers yields a cost sensitive uniform noise robust classifier as a solution of a system of linear equations. We also provide a performance bound for this classifier. The second scheme that we propose is a re-sampling based scheme that exploits the special structure of the uniform noise models and uses in-class probability estimates. Our computational experiments on some UCI datasets with class imbalance show that classifiers of our two schemes are on par with the existing methods and in fact better in some cases w.r.t. Accuracy and Arithmetic Mean, without using/tuning noise rate. We also consider other cost sensitive performance measures viz., F measure and Weighted Cost for evaluation.


Exploiting Synchronized Lyrics And Vocal Features For Music Emotion Detection

arXiv.org Artificial Intelligence

Support Vector Machines are employed engaging playlists according to sentiment and with good results also for multilabel classification [30], emotions. While previous works were mostly based more recently also Convolutional Neural Networks were on audio for music discovery and playlists generation, used in this field [45]. Lyrics-based approaches, on the we take advantage of our synchronized lyrics dataset other hand, make use of Recurrent Neural Networks architectures to combine text representations and music features in (like LSTM [13]) for performing text classification a novel way; we therefore introduce the Synchronized [46, 47]. The idea of using lyrics combined with Lyrics Emotion Dataset. Unlike other approaches that voice only audio signals is done in [29], where emotion randomly exploited the audio samples and the whole recognition is performed by using textual and speech data, text, our data is split according to the temporal information instead of visual ones. Measuring and assigning emotions provided by the synchronization between lyrics to music is not a straightforward task: the sentiment/mood and audio. This work shows a comparison between associated with a song can be derived by a combination of text-based and audio-based deep learning classification many features, moreover, emotions expressed by a musical models using different techniques from Natural Language excerpt and by its corresponding lyrics do not always Processing and Music Information Retrieval domains.