Goto

Collaborating Authors

Linear and Quadratic Discriminant Analysis: Tutorial

arXiv.org Machine Learning

This tutorial explains Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) as two fundamental classification methods in statistical and probabilistic learning. We start with the optimization of decision boundary on which the posteriors are equal. Then, LDA and QDA are derived for binary and multiple classes. The estimation of parameters in LDA and QDA are also covered. Then, we explain how LDA and QDA are related to metric learning, kernel principal component analysis, Mahalanobis distance, logistic regression, Bayes optimal classifier, Gaussian naive Bayes, and likelihood ratio test. We also prove that LDA and Fisher discriminant analysis are equivalent. We finally clarify some of the theoretical concepts with simulations we provide.


A Bayesian Approach for Accurate Classification-Based Aggregates

arXiv.org Machine Learning

In this paper, we study the accuracy of values aggregated over classes predicted by a classification algorithm. The problem is that the resulting aggregates (e.g., sums of a variable) are known to be biased. The bias can be large even for highly accurate classification algorithms, in particular when dealing with class-imbalanced data. To correct this bias, the algorithm's classification error rates have to be estimated. In this estimation, two issues arise when applying existing bias correction methods. First, inaccuracies in estimating classification error rates have to be taken into account. Second, impermissible estimates, such as a negative estimate for a positive value, have to be dismissed. We show that both issues are relevant in applications where the true labels are known only for a small set of data points. We propose a novel bias correction method using Bayesian inference. The novelty of our method is that it imposes constraints on the model parameters. We show that our method solves the problem of biased classification-based aggregates as well as the two issues above, in the general setting of multi-class classification. In the empirical evaluation, using a binary classifier on a real-world dataset of company tax returns, we show that our method outperforms existing methods in terms of mean squared error.


The Infinite Gaussian Mixture Model

Neural Information Processing Systems

In a Bayesian mixture model it is not necessary a priori to limit the number of components to be finite. In this paper an infinite Gaussian mixture model is presented which neatly sidesteps the difficult problem of finding the "right" number of mixture components. Inference in the model is done using an efficient parameter-free Markov Chain that relies entirely on Gibbs sampling.


The Infinite Gaussian Mixture Model

Neural Information Processing Systems

In a Bayesian mixture model it is not necessary a priori to limit the number of components to be finite. In this paper an infinite Gaussian mixture model is presented which neatly sidesteps the difficult problem of finding the "right" number of mixture components. Inference in the model is done using an efficient parameter-free Markov Chain that relies entirely on Gibbs sampling.


The Infinite Gaussian Mixture Model

Neural Information Processing Systems

In a Bayesian mixture model it is not necessary a priori to limit the number ofcomponents to be finite. In this paper an infinite Gaussian mixture model is presented which neatly sidesteps the difficult problem of finding the"right" number of mixture components. Inference in the model is done using an efficient parameter-free Markov Chain that relies entirely on Gibbs sampling.