Statistical Learning
How Is Grandma Doing? Predicting Functional Health Status from Binary Ambient Sensor Data
Robben, Saskia (Amsterdam University of Applied Science) | Englebienne, Gwenn (University of Amsterdam) | Pol, Margriet (Amsterdam University of Applied Sciences) | Kröse, Ben (University of Amsterdam)
Ambient activity monitoring systems produce large amounts of data, which can be used for health monitoring.The problem is that patterns in this data reflecting health status are not identified yet. In this paper the possibility is explored of predicting the functional health status (the motor score of AMPS = Assessment of Motor and Process Skills) of a person from data of binary ambient sensors. Data is collected of five independently living elderly people. Based on expert knowledge, features are extracted from the sensor data and several subsets are selected. We use standard linear regression and Gaussian processes for mapping the features to the functional status and predict the status of a test person using a leave-one-person-out cross validation. The results show that Gaussian processes perform better than the linear regression model, and that both models perform better with the basic feature set than with location or transition based features.Some suggestions are provided for better feature extraction and selection for the purpose of health monitoring.These results indicate that automated functional health assessment is possible, but some challenges lie ahead. The most important challenge is eliciting expert knowledge and translating that into quantifiable features.
Language Analysis of Speakers with Dementia of the Alzheimer’s Type
Guinn, Curry I. (University of North Carolina Wilmington) | Habash, Anthony (University of North Carolina Wilmington)
This research is a discriminative analysis of conversational dialogs involving individuals suffering from dementia of Alzheimer’s type. Several metric analyses are applied to the transcripts of the Carolina Conversation Corpus (Pope and Davis 2011) in order to determine if there are significant statistical differences between individuals with and without Alzheimer’s disease. Results from the analysis indicate that go-ahead utterances, certain fluency measures, and paraphrasing provide defensible means of differentiating the linguistic characteristics of spontaneous speech between healthy individuals and those with Alzheimer’s disease. Several machine learning algorithms were used to classify the speech of individuals with and without dementia of the Alzheimer’s type.
Rejoinder: Latent variable graphical model selection via convex optimization
Chandrasekaran, Venkat, Parrilo, Pablo A., Willsky, Alan S.
We thank all the discussants for their careful reading of our paper, and for their insightful critiques. We would also like to thank the editors for organizing this discussion. Our paper contributes to the area of high-dimensional statistics which has received much attention over the past several years across the statistics, machine learning and signal processing communities. In this rejoinder we clarify and comment on some of the points raised in the discussions. Finally, we also remark on some interesting challenges that lie ahead in latent variable modeling. Briefly, we considered the problem of latent variable graphical model selection in the Gaussian setting.
Comparing K-Nearest Neighbors and Potential Energy Method in classification problem. A case study using KNN applet by E.M. Mirkes and real life benchmark data sets
Abstract: K-nearest neighbors (KNN) method is used in many supervised learning classification problems. Potential Energy (PE) method is also developed for classification problems based on its physical metaphor. The energy potential used in the experiments are Yukawa potential and Gaussian Potential. In this paper, I use both applet and MATLAB program with real life benchmark data to analyze the performances of KNN and PE method in classification problems. The results show that in general, KNN and PE methods have similar performance. In particular, PE with Yukawa potential has worse performance than KNN when the density of the data is higher in the distribution of the database. When the Gaussian potential is applied, the results from PE and KNN have similar behavior. The indicators used are correlation coefficients and information gain. Keywords: K-nearest neighbor, potential energy method, Yukawa potential, Gaussian potential, correlation coefficients, information gain 1. Introduction The target of supervised learning is to learn a mapping from the input to an output whose correct values are provided. However for unsupervised learning, no correct values are provided hence the only known object is the input data and the target is to find the regularities in the input. Classification is considered as an object of supervised learning.
Discussion: Latent variable graphical model selection via convex optimization
Candés, Emmanuel J., Soltanolkotabi, Mahdi
DISCUSSION: LATENT VARIABLE GRAPHICAL MODEL SELECTION VIA CONVEX OPTIMIZATION By Emmanuel J. Candés and Mahdi Soltanolkotabi Stanford University We wish to congratulate the authors for their innovative contribution, which is bound to inspire much further research. We find latent variable model selection to be a fantastic application of matrix decomposition methods, namely, the superposition of low-rank and sparse elements. Clearly, the methodology introduced in this paper is of potential interest across many disciplines. In the following, we will first discuss this paper in more detail and then reflect on the versatility of the low-rank sparse decomposition. The proposed scheme is an extension of the graphical lasso of Yuan and Lin [15] (see also [1, 6]), which is a popular approach for learning the structure in an undirected Gaussian graphical model.
Discussion: Latent variable graphical model selection via convex optimization
It is my pleasure to congratulate the authors for an innovative and inspiring piece of work. Chandrasekaran, Parrilo and Willsky (hereafter CPW) have come up with a novel approach, combining ideas from convex optimization and algebraic geometry, to the longstanding problem of Gaussian graphical model selection with latent variables. Their method is intuitive and simple to implement, based on solving a convex log-determinant program with suitable choices of regularization. In addition, they establish a number of attractive theoretical guarantees that hold under highdimensional scaling, meaning that the graph size p and sample size n are allowed to grow simultaneously.
Discussion: Latent variable graphical model selection via convex optimization
By Ming Yuan Georgia Institute of Technology I want to start by congratulating Professors Chandrasekaran, Parrilo and Willsky for this fine piece of work. Their paper, hereafter referred to as CPW, addresses one of the biggest practical challenges of Gaussian graphical models--how to make inferences for a graphical model in the presence of missing variables. The difficulty comes from the fact that the validity of conditional independence relationships implied by a graphical model relies critically on the assumption that all conditional variables are observed, which of course can be unrealistic. As CPW shows, this is not as hopeless as it might appear to be. They characterize conditions under which a conditional graphical model can be identified, and offer a penalized likelihood method to reconstruct it.
Soft (Gaussian CDE) regression models and loss functions
Regression, unlike classification, has lacked a comprehensive and effective approach to deal with cost-sensitive problems by the reuse (and not a re-training) of general regression models. In this paper, a wide variety of cost-sensitive problems in regression (such as bids, asymmetric losses and rejection rules) can be solved effectively by a lightweight but powerful approach, consisting of: (1) the conversion of any traditional one-parameter crisp regression model into a two-parameter soft regression model, seen as a normal conditional density estimator, by the use of newly-introduced enrichment methods; and (2) the reframing of an enriched soft regression model to new contexts by an instance-dependent optimisation of the expected loss derived from the conditional normal distribution.
Understanding the Interaction between Interests, Conversations and Friendships in Facebook
Ho, Qirong, Yan, Rong, Raina, Rajat, Xing, Eric P.
In this paper, we explore salient questions about user interests, conversations and friendships in the Facebook social network, using a novel latent space model that integrates several data types. A key challenge of studying Facebook's data is the wide range of data modalities such as text, network links, and categorical labels. Our latent space model seamlessly combines all three data modalities over millions of users, allowing us to study the interplay between user friendships, interests, and higher-order network-wide social trends on Facebook. The recovered insights not only answer our initial questions, but also reveal surprising facts about user interests in the context of Facebook's ecosystem. We also confirm that our results are significant with respect to evidential information from the study subjects.