Goto

Collaborating Authors

 Directed Networks


Accurate parameter estimation for Bayesian Network Classifiers using Hierarchical Dirichlet Processes

arXiv.org Machine Learning

This paper introduces a novel parameter estimation method for the probability tables of Bayesian network classifiers (BNCs), using hierarchical Dirichlet processes (HDPs). The main result of this paper is to show that improved parameter estimation allows BNCs to outperform leading learning methods such as Random Forest for both 0-1 loss and RMSE, albeit just on categorical datasets. As data assets become larger, entering the hyped world of "big", efficient accurate classification requires three main elements: (1) classifiers with low-bias that can capture the fine-detail of large datasets (2) out-of-core learners that can learn from data without having to hold it all in main memory and (3) models that can classify new data very efficiently. The latest Bayesian network classifiers (BNCs) satisfy these requirements. Their bias can be controlled easily by increasing the number of parents of the nodes in the graph. Their structure can be learned out of core with a limited number of passes over the data. However, as the bias is made lower to accurately model classification tasks, so is the accuracy of their parameters' estimates, as each parameter is estimated from ever decreasing quantities of data. In this paper, we introduce the use of Hierarchical Dirichlet Processes for accurate BNC parameter estimation. We conduct an extensive set of experiments on 68 standard datasets and demonstrate that our resulting classifiers perform very competitively with Random Forest in terms of prediction, while keeping the out-of-core capability and superior classification time.


Recursive nonlinear-system identification using latent variables

arXiv.org Machine Learning

In this paper we develop a method for learning nonlinear systems with multiple outputs and inputs. We begin by modelling the errors of a nominal predictor of the system using a latent variable framework. Then using the maximum likelihood principle we derive a criterion for learning the model. The resulting optimization problem is tackled using a majorization-minimization approach. Finally, we develop a convex majorization technique and show that it enables a recursive identification method. The method learns parsimonious predictive models and is tested on both synthetic and real nonlinear systems.


Neural computation from first principles: Using the maximum entropy method to obtain an optimal bits-per-joule neuron

arXiv.org Machine Learning

Optimization results are one method for understanding neural computation from Nature's perspective and for defining the physical limits on neuron-like engineering. Earlier work looks at individual properties or performance criteria and occasionally a combination of two, such as energy and information. Here we make use of Jaynes' maximum entropy method and combine a larger set of constraints, possibly dimensionally distinct, each expressible as an expectation. The method identifies a likelihood-function and a sufficient statistic arising from each such optimization. This likelihood is a first-hitting time distribution in the exponential class. Particular constraint sets are identified that, from an optimal inference perspective, justify earlier neurocomputational models. Interactions between constraints, mediated through the inferred likelihood, restrict constraint-set parameterizations, e.g., the energy-budget limits estimation performance which, in turn, matches an axonal communication constraint. Such linkages are, for biologists, experimental predictions of the method. In addition to the related likelihood, at least one type of constraint set implies marginal distributions, and in this case, a Shannon bits/joule statement arises.


Deep generative models of genetic variation capture mutation effects

arXiv.org Machine Learning

The functions of proteins and RNAs are determined by a myriad of interactions between their constituent residues, but most quantitative models of how molecular phenotype depends on genotype must approximate this by simple additive effects. While recent models have relaxed this constraint to also account for pairwise interactions, these approaches do not provide a tractable path towards modeling higher-order dependencies. Here, we show how latent variable models with nonlinear dependencies can be applied to capture beyond-pairwise constraints in biomolecules. We present a new probabilistic model for sequence families, DeepSequence, that can predict the effects of mutations across a variety of deep mutational scanning experiments significantly better than site independent or pairwise models that are based on the same evolutionary data. The model, learned in an unsupervised manner solely from sequence information, is grounded with biologically motivated priors, reveals latent organization of sequence families, and can be used to extrapolate to new parts of sequence space.


Deep Neural Generative Model of Functional MRI Images for Psychiatric Disorder Diagnosis

arXiv.org Machine Learning

Accurate diagnosis of psychiatric disorders plays a critical role in improving quality of life for patients and potentially supports the development of new treatments. Many studies have been conducted on machine learning techniques that seek brain imaging data for specific biomarkers of disorders. These studies have encountered the following dilemma: An end-to-end classification overfits to a small number of high-dimensional samples but unsupervised feature-extraction has the risk of extracting a signal of no interest. In addition, such studies often provided only diagnoses for patients without presenting the reasons for these diagnoses. This study proposed a deep neural generative model of resting-state functional magnetic resonance imaging (fMRI) data. The proposed model is conditioned by the assumption of the subject's state and estimates the posterior probability of the subject's state given the imaging data, using Bayes' rule. This study applied the proposed model to diagnose schizophrenia and bipolar disorders. Diagnosis accuracy was improved by a large margin over competitive approaches, namely a support vector machine, logistic regression, and multilayer perceptron with or without unsupervised feature-extractors in addition to a Gaussian mixture model. The proposed model visualizes brain regions largely related to the disorders, thus motivating further biological investigation.


A Novel Bayesian Cluster Enumeration Criterion for Unsupervised Learning

arXiv.org Machine Learning

We derive a new Bayesian Information Criterion (BIC) from first principles by formulating the problem of estimating the number of clusters in an observed data set as maximization of the posterior probability of the candidate models. Given that some mild assumptions are satisfied, we provide a general BIC expression for a broad class of data distributions. This serves as an important milestone when deriving the BIC for specific data distributions. Along this line, we provide a closed-form BIC expression for multivariate Gaussian distributed observations. We show that incorporating data structure of the clustering problem into the derivation of the BIC results in an expression whose penalty term is different from that of the original BIC. We propose a two-step cluster enumeration algorithm. First, a model-based unsupervised learning algorithm partitions the data according to a given set of candidate models. Subsequently, the optimal cluster number is determined as the one associated to the model for which the proposed BIC is maximal. The performance of the proposed criterion is tested using synthetic and real data sets. Despite the fact that the original BIC is a generic criterion which does not include information about the specific model selection problem at hand, it has been widely used in the literature to estimate the number of clusters in an observed data set. We, therefore, consider it as a benchmark comparison. Simulation results show that our proposed criterion outperforms the existing cluster enumeration methods that are based on the original BIC.


Realistic Traffic Generation for Web Robots

arXiv.org Machine Learning

Critical to evaluating the capacity, scalability, and availability of web systems are realistic web traffic generators. Web traffic generation is a classic research problem, no generator accounts for the characteristics of web robots or crawlers that are now the dominant source of traffic to a web server. Administrators are thus unable to test, stress, and evaluate how their systems perform in the face of ever increasing levels of web robot traffic. To resolve this problem, this paper introduces a novel approach to generate synthetic web robot traffic with high fidelity. It generates traffic that accounts for both the temporal and behavioral qualities of robot traffic by statistical and Bayesian models that are fitted to the properties of robot traffic seen in web logs from North America and Europe. We evaluate our traffic generator by comparing the characteristics of generated traffic to those of the original data. We look at session arrival rates, inter-arrival times and session lengths, comparing and contrasting them between generated and real traffic. Finally, we show that our generated traffic affects cache performance similarly to actual traffic, using the common LRU and LFU eviction policies.


Stan vs PyMc3 (vs Edward) – Towards Data Science

@machinelearnbot

The holy trinity when it comes to being Bayesian. I will provide my experience in using the first two packages and my high level opinion of the third (haven't used it in practice). Of course then there is the mad men (old professors who are becoming irrelevant) who actually do their own Gibbs sampling. You specify the generative model for the data. You feed in the data as observations and then it samples from the posterior of the data for you.


Add Machine Learning For an Effective Marketing Campaign

@machinelearnbot

What do most effective marketing campaigns have in common? Let us suppose that a company wants to perform a direct marketing campaign to get a response (like a subscription or a purchase) from users. It wants to run a marketing campaign for around 10,000 users out of which only 1,000 users are expected to respond. But the company doesn't have a budget to reach out to all the 10,000 customers. To minimize the cost, the company wants to reach out to the smallest number of customers as possible but at the same time reach out to most (user defined) of the customers who are likely to respond.


On Gaussian Markov models for conditional independence

arXiv.org Artificial Intelligence

Markov models, or probabilistic graphical models, explicitly establish a correspondence between statistical independence in a probability distribution and certain separation criteria holding in a graph. They were originated at the interface between statistics, where Markov random fields were predominant [Darroch et al., 1980], and artificial intelligence, with a focus on Bayesian networks [Pearl, 1985, 1986]. These two models are now considered the traditional ones, but still are widely applied and nowadays there is a significant amount of research devoted to them [Daly et al., 2011, Uhler, 2012]. They both share the modelling of conditional independences: Bayesian networks relate them with acyclic directed graphs, whereas in Markov fields they are associated with undirected graphs. However, the models they represent are only equivalent under additional assumptions on the respective graphs.