Goto

Collaborating Authors

 Bayesian Learning


Bayesian Kernel and Mutual $k$-Nearest Neighbor Regression

arXiv.org Machine Learning

We propose Bayesian extensions of two nonparametric regression methods which are kernel and mutual $k$-nearest neighbor regression methods. Derived based on Gaussian process models for regression, the extensions provide distributions for target value estimates and the framework to select the hyperparameters. It is shown that both the proposed methods asymptotically converge to kernel and mutual $k$-nearest neighbor regression methods, respectively. The simulation results show that the proposed methods can select proper hyperparameters and are better than or comparable to the former methods for an artificial data set and a real world data set.


Robust Non-linear Regression: A Greedy Approach Employing Kernels with Application to Image Denoising

arXiv.org Machine Learning

We consider the task of robust non-linear regression in the presence of both inlier noise and outliers. Assuming that the unknown non-linear function belongs to a Reproducing Kernel Hilbert Space (RKHS), our goal is to estimate the set of the associated unknown parameters. Due to the presence of outliers, common techniques such as the Kernel Ridge Regression (KRR) or the Support Vector Regression (SVR) turn out to be inadequate. Instead, we employ sparse modeling arguments to explicitly model and estimate the outliers, adopting a greedy approach. The proposed robust scheme, i.e., Kernel Greedy Algorithm for Robust Denoising (KGARD), is inspired by the classical Orthogonal Matching Pursuit (OMP) algorithm. Specifically, the proposed method alternates between a KRR task and an OMP-like selection step. Theoretical results concerning the identification of the outliers are provided. Moreover, KGARD is compared against other cutting edge methods, where its performance is evaluated via a set of experiments with various types of noise. Finally, the proposed robust estimation framework is applied to the task of image denoising, and its enhanced performance in the presence of outliers is demonstrated.


Blocking Collapsed Gibbs Sampler for Latent Dirichlet Allocation Models

arXiv.org Machine Learning

The latent Dirichlet allocation (LDA) model is a widely-used latent variable model in machine learning for text analysis. Inference for this model typically involves a single-site collapsed Gibbs sampling step for latent variables associated with observations. The efficiency of the sampling is critical to the success of the model in practical large scale applications. In this article, we introduce a blocking scheme to the collapsed Gibbs sampler for the LDA model which can, with a theoretical guarantee, improve chain mixing efficiency. We develop two procedures, an O(K)-step backward simulation and an O(log K)-step nested simulation, to directly sample the latent variables within each block. We demonstrate that the blocking scheme achieves substantial improvements in chain mixing compared to the state of the art single-site collapsed Gibbs sampler. We also show that when the number of topics is over hundreds, the nested-simulation blocking scheme can achieve a significant reduction in computation time compared to the single-site sampler.


Combining Random Walks and Nonparametric Bayesian Topic Model for Community Detection

arXiv.org Machine Learning

Community detection has been an active research area for decades. Among all probabilistic models, Stochastic Block Model has been the most popular one. This paper introduces a novel probabilistic model: RW-HDP, based on random walks and Hierarchical Dirichlet Process, for community extraction. In RW-HDP, random walks conducted in a social network are treated as documents; nodes are treated as words. By using Hierarchical Dirichlet Process, a nonparametric Bayesian model, we are not only able to cluster nodes into different communities, but also determine the number of communities automatically. We use Stochastic Variational Inference for our model inference, which makes our method time efficient and can be easily extended to an online learning algorithm.


Time-Sensitive Bayesian Information Aggregation for Crowdsourcing Systems

Journal of Artificial Intelligence Research

Many aspects of the design of efficient crowdsourcing processes, such as defining workers bonuses, fair prices and time limits of the tasks, involve knowledge of the likely duration of the task at hand. In this work we introduce a new timesensitive Bayesian aggregation method that simultaneously estimates a tasks duration and obtains reliable aggregations of crowdsourced judgments. Our method, called BCCTime, uses latent variables to represent the uncertainty about the workers completion time, the tasks duration and the workers accuracy. To relate the quality of a judgment to the time a worker spends on a task, our model assumes that each task is completed within a latent time window within which all workers with a propensity to genuinely attempt the labelling task (i.e., no spammers) are expected to submit their judgments. In contrast, workers with a lower propensity to valid labelling, such as spammers, bots or lazy labellers, are assumed to perform tasks considerably faster or slower than the time required by normal workers. Specifically, we use efficient message-passing Bayesian inference to learn approximate posterior probabilities of (i) the confusion matrix of each worker, (ii) the propensity to valid labelling of each worker, (iii) the unbiased duration of each task and (iv) the true label of each task. Using two real- world public datasets for entity linking tasks, we show that BCCTime produces up to 11% more accurate classifications and up to 100% more informative estimates of a tasks duration compared to stateoftheart methods.


Variational Mixture Models with Gamma or inverse-Gamma components

arXiv.org Machine Learning

Mixture models with Gamma and or inverse-Gamma distributed mixture components are useful for medical image tissue segmentation or as post-hoc models for regression coefficients obtained from linear regression within a Generalised Linear Modeling framework (GLM), used in this case to separate stochastic (Gaussian) noise from some kind of positive or negative "activation" (modeled as Gamma or inverse-Gamma distributed). To date, the most common choice in this context it is Gaussian/Gamma mixture models learned through a maximum likelihood (ML) approach; we recently extended such algorithm for mixture models with inverse-Gamma components. Here, we introduce a fully analytical Variational Bayes (VB) learning framework for both Gamma and/or inverse-Gamma components. We use synthetic and resting state fMRI data to compare the performance of the ML and VB algorithms in terms of area under the curve and computational cost. We observed that the ML Gaussian/Gamma model is very expensive specially when considering high resolution images; furthermore, these solutions are highly variable and they occasionally can overestimate the activations severely. The Bayesian Gauss-Gamma is in general the fastest algorithm but provides too dense solutions. The maximum likelihood Gaussian/inverse-Gamma is also very fast but provides in general very sparse solutions. The variational Gaussian/inverse-Gamma mixture model is the most robust and its cost is acceptable even for high resolution images. Further, the presented methodology represents an essential building block that can be directly used in more complex inference tasks, specially designed to analyse MRI-fMRI data; such models include for example analytical variational mixture models with adaptive spatial regularization or better source models for new spatial blind source separation approaches.


A New PAC-Bayesian Perspective on Domain Adaptation

arXiv.org Machine Learning

We study the issue of PAC-Bayesian domain adaptation: We want to learn, from a source domain, a majority vote model dedicated to a target one. Our theoretical contribution brings a new perspective by deriving an upper-bound on the target risk where the distributions' divergence-- expressed as a ratio--controls the tradeoff between a source error measure and the target voters' disagreement. Our bound suggests that one has to focus on regions where the source data is informative. From this result, we derive a PAC-Bayesian generalization bound, and specialize it to linear classifiers. Then, we infer a learning algorithm and perform experiments on real data.


maximum likelihood estimate and logistic regression simplified

#artificialintelligence

Least squares regression can cause impossible estimates such as probabilities that are less than zero and greater than 1.So, when the predicted value is measured as a probability, use Logistic Regression We use the log of the odds rather than the odds directly because an odds ratio cannot be a negative number--but its log can be negative. Notice that we have randomly initialized our coefficients for income and other predictors. These will be adjusted by Solver based on a likelihood function.We will cover them later Column H tells us the predicted probability of the borrower's actual behavior, whether that behavior is repayment or default--not simply, as in Column G, the predicted probability of defaulting on the loan. One property of logarithms is that their sum equals the logarithm of the product of the numbers on which they're based The logarithms of probabilities are always negative numbers, but the closer a probability is to 1.0, the closer its logarithm is to 0.0. I haven't covered cross-validation, which is commonly used to validate a logistic regression equation.If you don't always have a large number of cases to work with, a different approach is to use statistical inference.


How To Use Classification Machine Learning Algorithms in Weka - Machine Learning Mastery

#artificialintelligence

Weka makes a large number of classification algorithms available. The large number of machine learning algorithms available is one of the benefits of using the Weka platform to work through your machine learning problems. In this post you will discover how to use 5 top machine learning algorithms in Weka. How To Use Classification Machine Learning Algorithms in Weka Photo by Don Graham, some rights reserved. We are going to take a tour of 5 top classification algorithms in Weka.


Identifying Depression on Twitter

arXiv.org Machine Learning

Social media has recently emerged as a premier method to disseminate information online. Through these online networks, tens of millions of individuals communicate their thoughts, personal experiences, and social ideals. We therefore explore the potential of social media to predict, even prior to onset, Major Depressive Disorder (MDD) in online personas. We employ a crowdsourced method to compile a list of Twitter users who profess to being diagnosed with depression. Using up to a year of prior social media postings, we utilize a Bag of Words approach to quantify each tweet. Lastly, we leverage several statistical classifiers to provide estimates to the risk of depression. Our work posits a new methodology for constructing our classifier by treating social as a text-classification problem, rather than a behavioral one on social media platforms. By using a corpus of 2.5M tweets, we achieved an 81% accuracy rate in classification, with a precision score of .86. We believe that this method may be helpful in developing tools that estimate the risk of an individual being depressed, can be employed by physicians, concerned individuals, and healthcare agencies to aid in diagnosis, even possibly enabling those suffering from depression to be more proactive about recovering from their mental health.