Directed Networks
Semi-Supervised Learning with Normalizing Flows
Izmailov, Pavel, Kirichenko, Polina, Finzi, Marc, Wilson, Andrew Gordon
Normalizing flows transform a latent distribution through an invertible neural network for a flexible and pleasingly simple approach to generative modelling, while preserving an exact likelihood. We propose FlowGMM, an end-to-end approach to generative semi supervised learning with normalizing flows, using a latent Gaussian mixture model. FlowGMM is distinct in its simplicity, unified treatment of labelled and unlabelled data with an exact likelihood, interpretability, and broad applicability beyond image data. We show promising results on a wide range of applications, including AG-News and Yahoo Answers text data, tabular data, and semi-supervised image classification. We also show that FlowGMM can discover interpretable structure, provide real-time optimization-free feature visualizations, and specify well calibrated predictive distributions.
Incorporating physical constraints in a deep probabilistic machine learning framework for coarse-graining dynamical systems
Kaltenbach, Sebastian, Koutsourelakis, Phaedon-Stelios
Data-based discovery of effective, coarse-grained (CG) models of high-dimensional dynamical systems presents a unique challenge in computational physics and particularly in the context of multiscale problems. The present paper offers a data-based, probablistic perspective that enables the quantification of predictive uncertainties. One of the outstanding problems has been the introduction of physical constraints in the probabilistic machine learning objectives. The primary utility of such constraints stems from the undisputed physical laws such as conservation of mass, energy etc that they represent. Furthermore and apart from leading to physically realistic predictions, they can significantly reduce the requisite amount of training data which for high-dimensional, multiscale systems are expensive to obtain (Small Data regime). We formulate the coarse-graining process by employing a probabilistic state-space model and account for the aforementioned equality constraints as virtual observables in the associated densities. We demonstrate how probabilistic inference tools can be employed to identify the coarse-grained variables in combination with deep neural nets and their evolution model without ever needing to define a fine-to-coarse (restriction) projection and without needing time-derivatives of state variables. The formulation adopted enables the quantification of a crucial, and often neglected, component in the CG process, i.e. the predictive uncertainty due to information loss. Furthermore, it is capable of reconstructing the evolution of the full, fine-scale system and therefore the observables of interest need not be selected a priori. We demonstrate the efficacy of the proposed framework by applying it to systems of interacting particles and an image series of a nonlinear pendulum. In both cases we identify the underlying coarse dynamics and can generate extrap-olative predicitions including the forming and propagation of a shock for the particle systems and a stable trajectory in the phase space for the pendulum. Keywords: Bayesian machine learning, virtual observables, multiscale modeling, reduced order modeling, coarse graining1. Introduction High-dimensional, nonlinear dynamical systems are ubiquitous in applied physics and engineering. The computational resources needed for their solution can grow exponentially with the dimension of the state-space as well as with the smallest timescale that needs to be resolved as this determines the discretization time-step.
Bayesian Tensor Network and Optimization Algorithm for Probabilistic Machine Learning
Describing or calculating the conditional probabilities of multiple events is exponentially expensive. In this work, a natural generalization of Bayesian belief network is proposed by incorporating with tensor network, which is dubbed as Bayesian tensor network (BTN), to efficiently describe the conditional probabilities among multiple sets of events. The complexity of BTN that gives the conditional probabilities of $M$ sets of events scales only polynomially with $M$. To testify its validity, BTN is implemented to capture the conditional probabilities between images and their classifications, where each feature is mapped to a probability distribution of a set of mutually exclusive events. A rotation optimization method is suggested to update BTN, which avoids gradient vanishing problem and exhibits high efficiency. With a simple tree network structures, BTN exhibits competitive performances on fashion-MNIST dataset. Analogous to the tensor network simulations of quantum systems, the validity of BTN implies an "area law" of fluctuations in image recognition problems.
Build Your First Chatbot in Python
Building a chatbot is a great way to ensure that your customers or visitors get a good experience any time they visit your page. We saw the theoretical components of a chatbot in this article. Let us now see how to write it in code. We will use python for this. We will use the NLTK python library to do most of our tasks.
CHAMELEON: A Deep Learning Meta-Architecture for News Recommender Systems [Phd. Thesis]
Moreira, Gabriel de Souza Pereira
Recommender Systems (RS) have became a popular research topic and, since 2016, Deep Learning methods and techniques have been increasingly explored in this area. News RS are aimed to personalize users experiences and help them discover relevant articles from a large and dynamic search space. The main contribution of this research was named CHAMELEON, a Deep Learning meta-architecture designed to tackle the specific challenges of news recommendation. It consists of a modular reference architecture which can be instantiated using different neural building blocks. As information about users' past interactions is scarce in the news domain, the user context can be leveraged to deal with the user cold-start problem. Articles' content is also important to tackle the item cold-start problem. Additionally, the temporal decay of items (articles) relevance is very accelerated in the news domain. Furthermore, external breaking events may temporally attract global readership attention, a phenomenon generally known as concept drift in machine learning. All those characteristics are explicitly modeled on this research by a contextual hybrid session-based recommendation approach using Recurrent Neural Networks. The task addressed by this research is session-based news recommendation, i.e., next-click prediction using only information available in the current user session. A method is proposed for a realistic temporal offline evaluation of such task, replaying the stream of user clicks and fresh articles being continuously published in a news portal. Experiments performed with two large datasets have shown the effectiveness of the CHAMELEON for news recommendation on many quality factors such as accuracy, item coverage, novelty, and reduced item cold-start problem, when compared to other traditional and state-of-the-art session-based recommendation algorithms.
On the Validity of Bayesian Neural Networks for Uncertainty Estimation
Mitros, John, Mac Namee, Brian
Deep neural networks (DNN) are versatile parametric models utilised successfully in a diverse number of tasks and domains. However, they have limitations---particularly from their lack of robustness and over-sensitivity to out of distribution samples. Bayesian Neural Networks, due to their formulation under the Bayesian framework, provide a principled approach to building neural networks that address these limitations. This paper describes a study that empirically evaluates and compares Bayesian Neural Networks to their equivalent point estimate Deep Neural Networks to quantify the predictive uncertainty induced by their parameters, as well as their performance in view of this uncertainty. In this study, we evaluated and compared three point estimate deep neural networks against comparable Bayesian neural network alternatives using two well-known benchmark image classification datasets (CIFAR-10 and SVHN).
Learning from i.i.d. data under model miss-specification
This paper introduces a new approach to learning from i.i.d. data under model miss-specification. This approach casts the problem of learning as minimizing the expected code-length of a Bayesian mixture code. To solve this problem, we build on PAC-Bayes bounds, information theory and a new family of second-order Jensen bounds. The key insight of this paper is that the use of the standard (first-order) Jensen bounds in learning is suboptimal when our model class is miss-specified (i.e. it does not contain the data generating distribution). As a consequence of this insight, this work provides strong theoretical arguments explaining why the Bayesian posterior is not optimal for making predictions that generalize under model miss-specification because the Bayesian posterior is directly related to the use of first-order Jensen bounds. We then argue for the use of second-order Jensen bounds, which leads to new families of learning algorithms. In this work, we introduce novel variational and ensemble learning methods based on the minimization of a novel family of second-order PAC-Bayes bounds over the expected code-length of a Bayesian mixture code. Using this new framework, we also provide novel hypotheses of why parameters in a flat minimum generalize better than parameters in a sharp minimum.
Classifier Chains: A Review and Perspectives
Read, Jesse, Pfahringer, Bernhard, Holmes, Geoff, Frank, Eibe
The family of methods collectively known as classifier chains has become a popular approach to multi-label learning problems. This approach involves linking together off-the-shelf binary classifiers in a chain structure, such that class label predictions become features for other classifiers. Such methods have proved flexible and effective and have obtained state-of-the-art empirical performance across many datasets and multi-label evaluation metrics. This performance led to further studies of how exactly it works, and how it could be improved, and in the recent decade numerous studies have explored classifier chains mechanisms on a theoretical level, and many improvements have been made to the training and inference procedures, such that this method remains among the state-of-the-art options for multi-label learning. Given this past and ongoing interest, which covers a broad range of applications and research themes, the goal of this work is to provide a review of classifier chains, a survey of the techniques and extensions provided in the literature, as well as perspectives for this approach in the domain of multi-label classification in the future. We conclude positively, with a number of recommendations for researchers and practitioners, as well as outlining a number of areas for future research.
Text Classification for Azerbaijani Language Using Machine Learning and Embedding
Suleymanov, Umid, Kalejahi, Behnam Kiani, Amrahov, Elkhan, Badirkhanli, Rashid
Text classification systems will help to solve the text clustering problem in the Azerbaijani language. There are some text-classification applications for foreign languages, but we tried to build a newly developed system to solve this problem for the Azerbaijani language. Firstly, we tried to find out potential practice areas. The system will be useful in a lot of areas. It will be mostly used in news feed categorization. News websites can automatically categorize news into classes such as sports, business, education, science, etc. The system is also used in sentiment analysis for product reviews. For example, the company shares a photo of a new product on Facebook and the company receives a thousand comments for new products. The systems classify the comments into categories like positive or negative. The system can also be applied in recommended systems, spam filtering, etc. Various machine learning techniques such as Naive Bayes, SVM, Decision Trees have been devised to solve the text classification problem in Azerbaijani language.
TRADI: Tracking deep neural network weight distributions
Franchi, Gianni, Bursuc, Andrei, Aldea, Emanuel, Dubuisson, Severine, Bloch, Isabelle
During training, the weights of a Deep Neural Network (DNN) are optimized from a random initialization towards a nearly optimum value minimizing a loss function. Only this final state of the weights is typically kept for testing, while the wealth of information on the geometry of the weight space, accumulated over the descent towards the minimum is discarded. In this work we propose to make use of this knowledge and leverage it for computing the distributions of the weights of the DNN. This can be further used for estimating the epistemic uncertainty of the DNN by sampling an ensemble of networks from these distributions. T o this end we introduce a method for tracking the trajectory of the weights during optimization, that does not require any changes in the architecture nor on the training procedure. W e evaluate our method on standard classification and regression benchmarks, and on out-of-distribution detection for classification and semantic segmentation. W e achieve competitive results, while preserving computational efficiency in comparison to other popular approaches.