Goto

Collaborating Authors

 Directed Networks


Bayesian Graph Convolutional Neural Networks Using Non-Parametric Graph Learning

arXiv.org Machine Learning

Graph convolutional neural networks (GCNN) have been successfully applied to many different graph based learning tasks including node and graph classification, matrix completion, and learning of node embeddings. Despite their impressive performance, the techniques have a limited capability to incorporate the uncertainty in the underlined graph structure. In order to address this issue, a Bayesian GCNN (BGCN) framework was recently proposed. In this framework, the observed graph is considered to be a random realization from a parametric random graph model and the joint Bayesian inference of the graph and GCNN weights is performed. In this paper, we propose a non-parametric generative model for graphs and incorporate it within the BGCN framework. In addition to the observed graph, our approach effectively uses the node features and training labels in the posterior inference of graphs and attains superior or comparable performance in benchmark node classification tasks.


Variational Student: Learning Compact and Sparser Networks in Knowledge Distillation Framework

arXiv.org Machine Learning

The holy grail in deep neural network research is porting the memory- and computation-intensive network models on embedded platforms with a minimal compromise in model accuracy. To this end, we propose a novel approach, termed as Variational Student, where we reap the benefits of compressibility of the knowledge distillation (KD) framework, and sparsity inducing abilities of variational inference (VI) techniques. Essentially, we build a sparse student network, whose sparsity is induced by the variational parameters found via optimizing a loss function based on VI, leveraging the knowledge learnt by an accurate but complex pre-trained teacher network. Further, for sparsity enhancement, we also employ a Block Sparse Regularizer on a concatenated tensor of teacher and student network weights. We demonstrate that the marriage of KD and the VI techniques inherits compression properties from the KD framework, and enhances levels of sparsity from the VI approach, with minimal compromise in the model accuracy. We benchmark our results on LeNet MLP and VGGNet (CNN) and illustrate a memory footprint reduction of 64x and 213x on these MLP and CNN variants, respectively, without a need to retrain the teacher network. Furthermore, in the low data regime, we observed that our method outperforms state-of-the-art Bayesian techniques in terms of accuracy.


On the Efficiency of the Neuro-Fuzzy Classifier for User Knowledge Modeling Systems

arXiv.org Artificial Intelligence

User knowledge modeling systems are used as the most effective technology for grabbing new user's attention. Moreover, the quality of service (QOS) is increased by these intelligent services. This paper proposes two user knowledge classifiers based on artificial neural networks used as one of the influential parts of knowledge modeling systems. We employed multi-layer perceptron (MLP) and adaptive neural fuzzy inference system (ANFIS) as the classifiers. Moreover, we used real data contains the user's degree of study time, repetition number, their performance in exam, as well as the learning percentage, as our classifier's inputs. Compared with well-known methods like KNN and Bayesian classifiers used in other research with the same data sets, our experiments present better performance. Although, the number of samples in the train set is not large enough, the performance of the neuro-fuzzy classifier in the test set is 98.6% which is the best result in comparison with others. However, the comparison of MLP toward the ANFIS results presents performance reduction, although the MLP performance is more efficient than other methods like Bayesian and KNN. As our goal is evaluating and reporting the efficiency of a neuro-fuzzy classifier for user knowledge modeling systems, we utilized many different evaluation metrics such as Receiver Operating Characteristic and the Area Under its Curve, Total Accuracy, and Kappa statistics.


11 Alternatives To Keras For Deep Learning Enthusiasts

#artificialintelligence

Infer.NET is a machine learning framework for running Bayesian inference in graphical models. It provides state-of-the-art message-passing algorithms and statistical routines needed to perform inference for a wide variety of applications. There are various intuitive features in this framework such as rich modelling language, multiple inference algorithms, designed for large scale inference as well as user-extendable. With the help of this framework, various Bayesian models such as Bayes Point Machine classifiers, TrueSkill matchmaking, hidden Markov models, and Bayesian networks can be implemented with ease.


BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search

arXiv.org Machine Learning

Neural Architecture Search (NAS) has seen an explosion of research in the past few years. A variety of methods have been proposed to perform NAS, including reinforcement learning, Bayesian optimization with a Gaussian process model, evolutionary search, and gradient descent. In this work, we design a NAS algorithm that performs Bayesian optimization using a neural network model. We develop a path-based encoding scheme to featurize the neural architectures that are used to train the neural network model. This strategy is particularly effective for encoding architectures in cell-based search spaces. After training on just 200 random neural architectures, we are able to predict the validation accuracy of a new architecture to within one percent of its true accuracy on average, for popular search spaces. This may be of independent interest beyond Bayesian neural architecture search. We test our algorithm on the NASBench (Ying et al. 2019) and DARTS (Liu et al. 2018) search spaces, and we show that our algorithm outperforms other NAS methods including evolutionary search, reinforcement learning, AlphaX, ASHA, and DARTS. Our algorithm is over 100x more efficient than random search, and 3.8x more efficient than the next-best algorithm on the NASBench dataset. As there have been problems with fair and reproducible experimental evauations in the field of NAS, we adhere to the recent NAS research checklist (Lindauer and Hutter 2019) to facilitate NAS research. In particular, our implementation has been made publicly available, including all details needed to fully reproduce our results.


Online Gaussian LDA for Unsupervised Pattern Mining from Utility Usage Data

arXiv.org Machine Learning

Non-intrusive load monitoring (NILM) aims at separating a whole-home energy signal into its appliance components. Such method can be harnessed to provide various services to better manage and control energy consumption (optimal planning and saving). NILM has been traditionally approached from signal processing and electrical engineering perspectives. Recently, machine learning has started to play an important role in NILM. While most work has focused on supervised algorithms, unsupervised approaches can be more interesting and of practical use in real case scenarios. Specifically, they do not require labelled training data to be acquired from individual appliances and the algorithm can be deployed to operate on the measured aggregate data directly. In this paper, we propose a fully unsupervised NILM framework based on Bayesian hierarchical mixture models. In particular, we develop a new method based on Gaussian Latent Dirichlet Allocation (GLDA) in order to extract global components that summarise the energy signal. These components provide a representation of the consumption patterns. Designed to cope with big data, our algorithm, unlike existing NILM ones, does not focus on appliance recognition. To handle this massive data, GLDA works online. Another novelty of this work compared to the existing NILM is that the data involves different utilities (e.g, electricity, water and gas) as well as some sensors measurements. Finally, we propose different evaluation methods to analyse the results which show that our algorithm finds useful patterns.


Descriptive Dimensionality and Its Characterization of MDL-based Learning and Change Detection

arXiv.org Machine Learning

This paper introduces a new notion of dimensionality of probabilistic models from an information-theoretic view point. We call it the "descriptive dimension"(Ddim). We show that Ddim coincides with the number of independent parameters for the parametric class, and can further be extended to real-valued dimensionality when a number of models are mixed. The paper then derives the rate of convergence of the MDL (Minimum Description Length) learning algorithm which outputs a normalized maximum likelihood (NML) distribution with model of the shortest NML codelength. The paper proves that the rate is governed by Ddim. The paper also derives error probabilities of the MDL-based test for multiple model change detection. It proves that they are also governed by Ddim. Through the analysis, we demonstrate that Ddim is an intrinsic quantity which characterizes the performance of the MDL-based learning and change detection.


Understanding The Naive Bayes Classifier

#artificialintelligence

Let's step back first and frame our classification problem in Bayesian terms -- where we have a set of prior beliefs and update our beliefs as we observe and collect evidence. In statistics, everything revolves around hypotheses. We make a hypothesis (an informed guess) about how the world works, and then we go about collecting evidence to test that hypothesis (if you would like to know the details, I wrote a post about hypothesis testing here). Classification models can be framed as a hypothesis as well. Let's first write out the objective and variables of our classification problem: OK, so that's classification -- now let's examine classification through a Bayesian lens.


Accurate Layerwise Interpretable Competence Estimation

arXiv.org Machine Learning

Estimating machine learning performance 'in the wild' is both an important and unsolved problem. In this paper, we seek to examine, understand, and predict the pointwise competence of classification models. Our contributions are twofold: First, we establish a statistically rigorous definition of competence that generalizes the common notion of classifier confidence; second, we present the ALICE (Accurate Layerwise Interpretable Competence Estimation) Score, a pointwise competence estimator for any classifier. By considering distributional, data, and model uncertainty, ALICE empirically shows accurate competence estimation in common failure situations such as class-imbalanced datasets, out-of-distribution datasets, and poorly trained models. Our contributions allow us to accurately predict the competence of any classification model given any input and error function. We compare our score with state-of-the-art confidence estimators such as model confidence and Trust Score, and show significant improvements in competence prediction over these methods on datasets such as DIGITS, CIFAR10, and CIFAR100.


Rationally Inattentive Inverse Reinforcement Learning Explains YouTube Commenting Behavior

arXiv.org Machine Learning

We consider a novel application of inverse reinforcement learning which involves modeling, learning and predicting the commenting behavior of YouTube viewers. Each group of users is modeled as a rationally inattentive Bayesian agent. Our methodology integrates three key components. First, to identify distinct commenting patterns, we use deep embedded clustering to estimate framing information (essential extrinsic features) that clusters users into distinct groups. Second, we present an inverse reinforcement learning algorithm that uses Bayesian revealed preferences to test for rationality: does there exist a utility function that rationalizes the given data, and if yes, can it be used to predict future behavior? Finally, we impose behavioral economics constraints stemming from rational inattention to characterize the attention span of groups of users.The test imposes a R{\'e}nyi mutual information cost constraint which impacts how the agent can select attention strategies to maximize their expected utility. After a careful analysis of a massive YouTube dataset, our surprising result is that in most YouTube user groups, the commenting behavior is consistent with optimizing a Bayesian utility with rationally inattentive constraints. The paper also highlights how the rational inattention model can accurately predict future commenting behavior. The massive YouTube dataset and analysis used in this paper are available on GitHub and completely reproducible.