Directed Networks
Recovering a Hidden Community in a Preferential Attachment Graph
Hajek, Bruce, Sankagiri, Suryanarayana
A message passing algorithm is derived for recovering a dense subgraph within a graph generated by a variation of the Barab\'asi-Albert preferential attachment model. The estimator is assumed to know the arrival times, or order of attachment, of the vertices. The derivation of the algorithm is based on belief propagation under an independence assumption. Two precursors to the message passing algorithm are analyzed: the first is a degree thresholding (DT) algorithm and the second is an algorithm based on the arrival times of the children (C) of a given vertex, where the children of a given vertex are the vertices that attached to it. Algorithm C significantly outperforms DT, showing it is beneficial to know the arrival times of the children, beyond simply knowing the number of them. For fixed fraction of vertices in the community, fixed number of new edges per arriving vertex, and fixed affinity between vertices in the community, the probability of error for recovering the label of a vertex is found as a function of the time of attachment, for either algorithm DT or C, in the large graph limit. By averaging over the time of attachment, the limit in probability of the fraction of label errors made over all vertices is identified, for either of the algorithms DT or C.
Top 10 Machine Learning Algorithms for Beginners
The study of ML algorithms has gained immense traction post the Harvard Business Review article terming a'Data Scientist' as the'Sexiest job of the 21st century'. So, for those starting out in the field of ML, we decided to do a reboot of our immensely popular Gold blog The 10 Algorithms Machine Learning Engineers need to know - albeit this post is targetted towards beginners. ML algorithms are those that can learn from data and improve from experience, without human intervention. Learning tasks may include learning the function that maps the input to the output, learning the hidden structure in unlabeled data; or'instance-based learning', where a class label is produced for a new instance by comparing the new instance (row) to instances from the training data, which were stored in memory. 'Instance-based learning' does not create an abstraction from specific instances. Supervised learning can be explained as follows: use labeled training data to learn the mapping function from the input variables (X) to the output variable (Y).
Bayesian Methods for Hackers
Of course as an introductory book, we can only leave it at that: an introductory book. For the mathematically trained, they may cure the curiosity this text generates with other texts designed with mathematical analysis in mind. For the enthusiast with less mathematical-background, or one who is not interested in the mathematics but simply the practice of Bayesian methods, this text should be sufficient and entertaining. The choice of PyMC as the probabilistic programming language is two-fold. As of this writing, there is currently no central resource for examples and explanations in the PyMC universe.
Automatic feature engineering using Generative Adversarial Networks
The purpose of deep learning is to learn a representation of high dimensional and noisy data using a sequence of differentiable functions, i.e., geometric transformations, that can perhaps be used for supervised learning tasks among other tasks. It has had great success in discriminative models while generative models have not fared perhaps quite as well due to the limitations of explicit maximum likelihood estimation (MLE). Adversarial learning as presented in the Generative Adversarial Network (GAN) aims to overcome these problems by using implicit MLE. We will use the MNIST computer vision dataset and a synthetic financial transactions dataset for an insurance task for these experiments using GANs. GANs are a remarkably different method of learning compared to explicit MLE. Our purpose will be to show that the representation learnt by a GAN can be used for supervised learning tasks such as image recognition and insurance loss risk prediction.
Online Machine Learning in Big Data Streams
Benczúr, András A., Kocsis, Levente, Pálovics, Róbert
The area of online machine learning in big data streams covers algorithms that are (1) distributed and (2) work from data streams with only a limited possibility to store past data. The first requirement mostly concerns software architectures and efficient algorithms. The second one also imposes nontrivial theoretical restrictions on the modeling methods: In the data stream model, older data is no longer available to revise earlier suboptimal modeling decisions as the fresh data arrives. In this article, we provide an overview of distributed software architectures and libraries as well as machine learning models for online learning. We highlight the most important ideas for classification, regression, recommendation, and unsupervised modeling from streaming data, and we show how they are implemented in various distributed data stream processing systems. This article is a reference material and not a survey. We do not attempt to be comprehensive in describing all existing methods and solutions; rather, we give pointers to the most important resources in the field. All related sub-fields, online algorithms, online learning, and distributed data processing are hugely dominant in current research and development with conceptually new research results and software components emerging at the time of writing. In this article, we refer to several survey results, both for distributed data processing and for online machine learning. Compared to past surveys, our article is different because we discuss recommender systems in extended detail.
How Wrong Am I? - Studying Adversarial Examples and their Impact on Uncertainty in Gaussian Process Machine Learning Models
Grosse, Kathrin, Pfaff, David, Smith, Michael Thomas, Backes, Michael
Machine learning models are vulnerable to Adversarial Examples: minor perturbations to input samples intended to deliberately cause misclassification. Current defenses against adversarial examples, especially for Deep Neural Networks (DNN), are primarily derived from empirical developments, and their security guarantees are often only justified retroactively. Many defenses therefore rely on hidden assumptions that are subsequently subverted by increasingly elaborate attacks. This is not surprising: deep learning notoriously lacks a comprehensive mathematical framework to provide meaningful guarantees. In this paper, we leverage Gaussian Processes to investigate adversarial examples in the framework of Bayesian inference. Across different models and datasets, we find deviating levels of uncertainty reflect the perturbation introduced to benign samples by state-of-the-art attacks, including novel white-box attacks on Gaussian Processes. Our experiments demonstrate that even unoptimized uncertainty thresholds already reject adversarial examples in many scenarios.
Variational Autoencoders for Collaborative Filtering
Liang, Dawen, Krishnan, Rahul G., Hoffman, Matthew D., Jebara, Tony
We extend variational autoencoders (VAEs) to collaborative filtering for implicit feedback. This non-linear probabilistic model enables us to go beyond the limited modeling capacity of linear factor models which still largely dominate collaborative filtering research.We introduce a generative model with multinomial likelihood and use Bayesian inference for parameter estimation. Despite widespread use in language modeling and economics, the multinomial likelihood receives less attention in the recommender systems literature. We introduce a different regularization parameter for the learning objective, which proves to be crucial for achieving competitive performance. Remarkably, there is an efficient way to tune the parameter using annealing. The resulting model and learning algorithm has information-theoretic connections to maximum entropy discrimination and the information bottleneck principle. Empirically, we show that the proposed approach significantly outperforms several state-of-the-art baselines, including two recently-proposed neural network approaches, on several real-world datasets. We also provide extended experiments comparing the multinomial likelihood with other commonly used likelihood functions in the latent factor collaborative filtering literature and show favorable results. Finally, we identify the pros and cons of employing a principled Bayesian inference approach and characterize settings where it provides the most significant improvements.
Tree Ensembles with Rule Structured Horseshoe Regularization
Nalenz, Malte, Villani, Mattias
We propose a new Bayesian model for flexible nonlinear regression and classification using tree ensembles. The model is based on the RuleFit approach in Friedman and Popescu (2008) where rules from decision trees and linear terms are used in a L1-regularized regression. We modify RuleFit by replacing the L1-regularization by a horseshoe prior, which is well known to give aggressive shrinkage of noise predictor while leaving the important signal essentially untouched. This is especially important when a large number of rules are used as predictors as many of them only contribute noise. Our horseshoe prior has an additional hierarchical layer that applies more shrinkage a priori to rules with a large number of splits, and to rules that are only satisfied by a few observations. The aggressive noise shrinkage of our prior also makes it possible to complement the rules from boosting in Friedman and Popescu (2008) with an additional set of trees from random forest, which brings a desirable diversity to the ensemble. We sample from the posterior distribution using a very efficient and easily implemented Gibbs sampler. The new model is shown to outperform state-of-the-art methods like RuleFit, BART and random forest on 16 datasets. The model and its interpretation is demonstrated on the well known Boston housing data, and on gene expression data for cancer classification. The posterior sampling, prediction and graphical tools for interpreting the model results are implemented in a publicly available R package.
Reliable Uncertain Evidence Modeling in Bayesian Networks by Credal Networks
Marchetti, Sabina, Antonucci, Alessandro
A reliable modeling of uncertain evidence in Bayesian networks based on a set-valued quantification is proposed. Both soft and virtual evidences are considered. We show that evidence propagation in this setup can be reduced to standard updating in an augmented credal network, equivalent to a set of consistent Bayesian networks. A characterization of the computational complexity for this task is derived together with an efficient exact procedure for a subclass of instances. In the case of multiple uncertain evidences over the same variable, the proposed procedure can provide a set-valued version of the geometric approach to opinion pooling.
Designing Random Graph Models Using Variational Autoencoders With Applications to Chemical Design
Samanta, Bidisha, De, Abir, Ganguly, Niloy, Gomez-Rodriguez, Manuel
From left to right, given a graph G with a set of node features F and edge weights Y, the encoder aggregates information from a different number of hops j K away for each nodev G into an embedding vectorc v(j). To do so, it uses a feedforward network to propagate information between different search depths, which is parametrized by a set of weight matrices W j . This embedding vectors are then fed into a differentiable functionφ enc, which sets the parameters,µ k andσ k, of several multidimensional Gaussian distributionsq φ, from where the latent representation of each node in the input graph are sampled from. Variational autoencoders are characterized by a probabilistic generative modelp θ(x z) of the observed variablesx R N given the latent variablesz R M, a prior distribution over the latent variablesp(z) and an approximate probabilistic inference modelq φ (z x). In this characterization,p θ and q φ are arbitrary distributions parametrized by two (deep) neural networksθ and φ and one can think of the generative model as a probabilistic decoder, which decodes latent variables into observed variables, and the inference model as a probabilistic encoder, which encodes observed variables into latent variables. Ideally, if we use the maximum likelihood principle to train a variational autoencoder, we should optimize the marginal log-likelihood of the observed data, i.e., E D [log p θ(x)], wherep D is the data distribution. Unfortunately, computing logp θ(x) requires marginalization with respect to the the latent variablez, which is typically intractable. Therefore, one resorts to maximizing a variational lower bound or evidence lower bound (ELBO) of the log-likelihood the observed data, i.e., max θ max φ E D [ KL(q φ (z x) p(z)) E q φ (z x)log p θ(x z)] . Finally, note that the quality of this variational lower bound (and thus the quality of the resulting V AE) depends on the expressive ability of the approximate inference modelq φ (z x), which is typically assumed to be a normal distribution whose mean and variance are parametrized by a (deep) neural networkφ with the observed datax as an input.