Learning Graphical Models
Robust mixture modelling using sub-Gaussian stable distribution
Teimouri, Mahdi, Rezakhah, Saeid, Mohammdpour, Adel
Heavy-tailed distributions are widely used in robust mixture modelling due to possessing thick tails. As a computationally tractable subclass of the stable distributions, sub-Gaussian $\alpha$-stable distribution received much interest in the literature. Here, we introduce a type of expectation maximization algorithm that estimates parameters of a mixture of sub-Gaussian stable distributions. A comparative study, in the presence of some well-known mixture models, is performed to show the robustness and performance of the mixture of sub-Gaussian $\alpha$-stable distributions for modelling, simulated, synthetic, and real data.
Predicting with confidence: the best machine learning idea you never heard of
One of the disadvantages of machine learning as a discipline is the lack of reasonable confidence intervals on a given prediction. There are all kinds of reasons you might want such a thing, but I think machine learning and data science practitioners are so drunk with newfound powers, they forget where such a thing might be useful. If you're really confident, for example, that someone will click on an ad, you probably want to serve one that pays a nice click through rate. If you have some kind of gambling engine, you want to bet more money on the predictions you are more confident of. Or if you're diagnosing an illness in a patient, it would be awfully nice to be able to tell the patient how certain you are of the diagnosis and what the confidence in the prognosis is. There are various ad hoc ways that people do this sort of thing.
Effective and Extensible Feature Extraction Method Using Genetic Algorithm-Based Frequency-Domain Feature Search for Epileptic EEG Multi-classification
In this paper, a genetic algorithm-based frequency-domain feature search (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of epilepsy. In this method, frequency-domain features are first searched and then combined with nonlinear features. Subsequently, these features are selected and optimized to classify EEG signals. The extracted features are analyzed experimentally. The features extracted by GAFDS show remarkable independence, and they are superior to the nonlinear features in terms of the ratio of inter-class distance and intra-class distance. Moreover, the proposed feature search method can additionally search for features of instantaneous frequency in a signal after Hilbert transformation. The classification results achieved using these features are reasonable, thus, GAFDS exhibits good extensibility. Multiple classic classifiers (i.e., $k$-nearest neighbor, linear discriminant analysis, decision tree, AdaBoost, multilayer perceptron, and Na\"ive Bayes) achieve good results by using the features generated by GAFDS method and the optimized selection. Specifically, the accuracies for the two-classification and three-classification problems may reach up to 99% and 97%, respectively. Results of several cross-validation experiments illustrate that GAFDS is effective in feature extraction for EEG classification. Therefore, the proposed feature selection and optimization model can improve classification accuracy.
Learning Policies for Markov Decision Processes from Data
Hanawal, Manjesh K., Liu, Hao, Zhu, Henghui, Paschalidis, Ioannis Ch.
We consider the problem of learning a policy for a Markov decision process consistent with data captured on the state-actions pairs followed by the policy. We assume that the policy belongs to a class of parameterized policies which are defined using features associated with the state-action pairs. The features are known a priori, however, only an unknown subset of them could be relevant. The policy parameters that correspond to an observed target policy are recovered using $\ell_1$-regularized logistic regression that best fits the observed state-action samples. We establish bounds on the difference between the average reward of the estimated and the original policy (regret) in terms of the generalization error and the ergodic coefficient of the underlying Markov chain. To that end, we combine sample complexity theory and sensitivity analysis of the stationary distribution of Markov chains. Our analysis suggests that to achieve regret within order $O(\sqrt{\epsilon})$, it suffices to use training sample size on the order of $\Omega(\log n \cdot poly(1/\epsilon))$, where $n$ is the number of the features. We demonstrate the effectiveness of our method on a synthetic robot navigation example.
A Variational Bayesian Approach for Image Restoration. Application to Image Deblurring with Poisson-Gaussian Noise
Marnissi, Yosra, Zheng, Yuling, Chouzenoux, Emilie, Pesquet, Jean-Christophe
In this paper, a methodology is investigated for signal recovery in the presence of non-Gaussian noise. In contrast with regularized minimization approaches often adopted in the literature, in our algorithm the regularization parameter is reliably estimated from the observations. As the posterior density of the unknown parameters is analytically intractable, the estimation problem is derived in a variational Bayesian framework where the goal is to provide a good approximation to the posterior distribution in order to compute posterior mean estimates. Moreover, a majorization technique is employed to circumvent the difficulties raised by the intricate forms of the non-Gaussian likelihood and of the prior density. We demonstrate the potential of the proposed approach through comparisons with state-of-the-art techniques that are specifically tailored to signal recovery in the presence of mixed Poisson-Gaussian noise. Results show that the proposed approach is efficient and achieves performance comparable with other methods where the regularization parameter is manually tuned from the ground truth.
Detecting Falls with X-Factor Hidden Markov Models
Khan, Shehroz S., Karg, Michelle E., Kulic, Dana, Hoey, Jesse
Identification of falls while performing normal activities of daily living (ADL) is important to ensure personal safety and well-being. However, falling is a short term activity that occurs infrequently. This poses a challenge to traditional classification algorithms, because there may be very little training data for falls (or none at all). This paper proposes an approach for the identification of falls using a wearable device in the absence of training data for falls but with plentiful data for normal ADL. We propose three `X-Factor' Hidden Markov Model (XHMMs) approaches. The XHMMs model unseen falls using "inflated" output covariances (observation models). To estimate the inflated covariances, we propose a novel cross validation method to remove "outliers" from the normal ADL that serve as proxies for the unseen falls and allow learning the XHMMs using only normal activities. We tested the proposed XHMM approaches on two activity recognition datasets and show high detection rates for falls in the absence of fall-specific training data. We show that the traditional method of choosing a threshold based on maximum of negative of log-likelihood to identify unseen falls is ill-posed for this problem. We also show that supervised classification methods perform poorly when very limited fall data are available during the training phase.
The best kept secret about linear and logistic regression
All the regression theory developed by statisticians over the last 200 years (related to the general linear model) is useless. Regression can be performed as accurately without statistical models, including the computation of confidence intervals (for estimates, predicted values or regression parameters). The non-statistical approach is also more robust than theory described in all statistics textbooks and taught in all statistical courses. It does not require Map-Reduce when data is really big, nor any matrix inversion, maximum likelihood estimation, or mathematical optimization (Newton algorithm). It is indeed incredibly simple, robust, easy to interpret, and easy to code (no statistical libraries required).
The machine that wanted to be a mind ZDNet
Artificial intelligence is one of humankind's greatest and oldest ambitions. The quest for non-human intelligence has captivated magicians, astrologers and mystics for as long as such professions have existed, but it took Aristotle to kick things off properly. He was the first to start organising laws of thought and the way they interact with the real world -- the basic concepts behind AI. That was in the third century BC, and 2,300 years later we still haven't cracked the problem. Part of the trouble is that nobody knows what AI is.
Book review: The Theory That Would Not Die ZDNet
A few months ago, Autonomy founder and CEO Mike Lynch sold his company to HP for £7.1 billion. Back in 2000, when he had just become Britain's first software billionaire, Lynch gave an interview in which he talked about perception and explained how he built his company. It was based, he said, on the ideas of a little-known 18th-century clergyman called Thomas Bayes. That was my introduction to Thomas Bayes, whose ideas have been used to solve many intractable problems, a number of which Sharon Bertsch McGrayne studies in depth in The Theory That Would Not Die. In the last ten years, Bayes has become famous, and few working in the field of probability theory, computer intelligence or mathematics can have failed to have come into contact with his rule.