Performance Analysis
From Joyous to Clinically Depressed: Mood Detection Using Spontaneous Speech
Sharifa M, Alghowinem (Australian National University and Ministry of Higher Education, Kingdom of Saudi Arabia) | Goecke, Roland (Australian National University and University of Canberra) | Wagner, Michael (University of Canberra) | Epps, Julien (University of New South Wales) | Breakspear, Michael (University of New South Wales and Queensland Institute of Medical Research) | Parker, Gordon (University of New South Wales)
Depression and other mood disorders are common and disabling disorders. We present work towards an objective diagnostic aid supporting clinicians using affective sensing technology with a focus on acoustic and statistical features from spontaneous speech. This work investigates differences in expressing positive and negative emotions in depressed and healthy control subjects as well as whether initial gender classification increases the recognition rate. To this end, spontaneous speech from interviews of 30 subjects of each depressed and controls was analysed, with a focus on questions eliciting positive and negative emotions. Using HMMs with GMMs for classification with 30-fold cross-validation, we found that MFCC, energy and intensity features gave highest recognition rates when female and male subjects were analysed together. When the dataset was first split by gender, log energy and shimmer features, respectively, were found to give the highest recognition rates in females, while it was loudness for males. Overall, correct recognition rates from acoustic features for depressed female subjects were higher than for male subjects. Using statistical features, we found that the response time and average syllable duration were longer in depressed subjects, while the interaction involvement and articulation rate were higher in control subjects.
Robustness of Threshold-Based Feature Rankers with Data Sampling on Noisy and Imbalanced Data
Shanab, Ahmad Abu (Florida Atlantic University) | Khoshgoftaar, Taghi M. (Florida Atlantic University) | Wald, Randall (Florida Atlantic University)
Gene selection has become a vital component in the learning process when using high-dimensional gene expression data. Although extensive research has been done towards evaluating the performance of classifiers trained with the selected features, the stability of feature ranking techniques has received relatively little study. This work evaluates the robustness of eleven threshold-based feature selection techniques, examining the impact of data sampling and class noise on the stability of feature selection. To assess the robustness of feature selection techniques, we use four groups of gene expression datasets, employ eleven threshold-based feature rankers, and generate artificial class noise to better simulate real-world datasets. The results demonstrate that although no ranker consistently outperforms the others, MI and Dev show the best stability on average, while GI and PR show the least stability on average. Results also show that trying to balance datasets through data sampling has on average no positive impact on the stability of feature ranking techniques applied to those datasets. In addition, increased feature subset sizes improve stability, but only does so reliably for noisy datasets.
Real-Time Filtering for Pulsing Public Opinion in Social Media
Finn, Samantha (Wellesley College) | Mustafaraj, Eni (Wellesley College)
When analysing social media conversations, in search of the public opinion about an unfolding event that is be- ing discussed in real-time (e.g., presidential debates, major speeches, etc.), it is important to distinguish between two groups of participants: opinion-makers and opinion-holders. To address this problem, we propose a supervised machine-learning approach, which uses inexpensively acquired labeled data from monothematic Twitter accounts to learn a binary classifier for the labels โpolitical accountโ (opinion-makers) and โnon-political accountโ (opinion-holders). While the classifier has a 83% accuracy on individual tweets, when applied to the last 200 tweets from accounts of a set of 1000 Twitter users, it classifies accounts with a 97% accuracy. This high accuracy derives from our decision to incorporate information about classifier probability into the classification. Our work demonstrates that machine learning algorithms can play a critical role in improving the quality of social media analytics and understanding, whose importance is increasing as social media adoption becomes widespread.
L2 Regularization for Learning Kernels
Cortes, Corinna, Mohri, Mehryar, Rostamizadeh, Afshin
The choice of the kernel is critical to the success of many learning algorithms but it is typically left to the user. Instead, the training data can be used to learn the kernel by selecting it out of a given family, such as that of non-negative linear combinations of p base kernels, constrained by a trace or L1 regularization. This paper studies the problem of learning kernels with the same family of kernels but with an L2 regularization instead, and for regression problems. We analyze the problem of learning kernels with ridge regression. We derive the form of the solution of the optimization problem and give an efficient iterative algorithm for computing that solution. We present a novel theoretical analysis of the problem based on stability and give learning bounds for orthogonal kernels that contain only an additive term O(pp/m) when compared to the standard kernel ridge regression stability bound. We also report the results of experiments indicating that L1 regularization can lead to modest improvements for a small number of kernels, but to performance degradations in larger-scale cases. In contrast, L2 regularization never degrades performance and in fact achieves significant improvements with a large number of kernels.
Virtual Vector Machine for Bayesian Online Classification
Minka, Thomas P., Xiang, Rongjing, Yuan, null, Qi, null
In a typical online learning scenario, a learner is required to process a large data stream using a small memory buffer. Such a requirement is usually in conflict with a learner's primary pursuit of prediction accuracy. To address this dilemma, we introduce a novel Bayesian online classification algorithm, called the Virtual Vector Machine. The virtual vector machine allows you to smoothly tradeoff prediction accuracy with memory size. The virtual vector machine summarizes the information contained in the preceding data stream by a Gaussian distribution over the classification weights plus a constant number of virtual data points. The virtual data points are designed to add extra non-Gaussian information about the classification weights. To maintain the constant number of virtual points, the virtual vector machine adds the current real data point into the virtual point set, merges two most similar virtual points into a new virtual point or deletes a virtual point that is far from the decision boundary. The information lost in this process is absorbed into the Gaussian distribution. The extra information provided by the virtual points leads to improved predictive accuracy over previous online classification algorithms.
Using the Gene Ontology Hierarchy when Predicting Gene Function
Mostafavi, Sara, Morris, Quaid
The problem of multilabel classification when the labels are related through a hierarchical categorization scheme occurs in many application domains such as computational biology. For example, this problem arises naturally when trying to automatically assign gene function using a controlled vocabularies like Gene Ontology. However, most existing approaches for predicting gene functions solve independent classification problems to predict genes that are involved in a given function category, independently of the rest. Here, we propose two simple methods for incorporating information about the hierarchical nature of the categorization scheme. In the first method, we use information about a gene's previous annotation to set an initial prior on its label. In a second approach, we extend a graph-based semi-supervised learning algorithm for predicting gene function in a hierarchy. We show that we can efficiently solve this problem by solving a linear system of equations. We compare these approaches with a previous label reconciliation-based approach. Results show that using the hierarchy information directly, compared to using reconciliation methods, improves gene function prediction.
BPR: Bayesian Personalized Ranking from Implicit Feedback
Rendle, Steffen, Freudenthaler, Christoph, Gantner, Zeno, Schmidt-Thieme, Lars
Item recommendation is the task of predicting a personalized ranking on a set of items (e.g. websites, movies, products). In this paper, we investigate the most common scenario with implicit feedback (e.g. clicks, purchases). There are many methods for item recommendation from implicit feedback like matrix factorization (MF) or adaptive knearest-neighbor (kNN). Even though these methods are designed for the item prediction task of personalized ranking, none of them is directly optimized for ranking. In this paper we present a generic optimization criterion BPR-Opt for personalized ranking that is the maximum posterior estimator derived from a Bayesian analysis of the problem. We also provide a generic learning algorithm for optimizing models with respect to BPR-Opt. The learning method is based on stochastic gradient descent with bootstrap sampling. We show how to apply our method to two state-of-the-art recommender models: matrix factorization and adaptive kNN. Our experiments indicate that for the task of personalized ranking our optimization method outperforms the standard learning techniques for MF and kNN. The results show the importance of optimizing models for the right criterion.
Spatial Multiresolution Cluster Detection Method
Zhang, Lingsong, Zhu, Zhengyuan
A novel multi-resolution cluster detection (MCD) method is proposed to identify irregularly shaped clusters in space. Multi-scale test statistic on a single cell is derived based on likelihood ratio statistic for Bernoulli sequence, Poisson sequence and Normal sequence. A neighborhood variability measure is defined to select the optimal test threshold. The MCD method is compared with single scale testing methods controlling for false discovery rate and the spatial scan statistics using simulation and f-MRI data. The MCD method is shown to be more effective for discovering irregularly shaped clusters, and the implementation of this method does not require heavy computation, making it suitable for cluster detection for large spatial data.
Reduced Rank Vector Generalized Linear Models for Feature Extraction
Supervised linear feature extraction can be achieved by fitting a reduced rank multivariate model. This paper studies rank penalized and rank constrained vector generalized linear models. From the perspective of thresholding rules, we build a framework for fitting singular value penalized models and use it for feature extraction. Through solving the rank constraint form of the problem, we propose progressive feature space reduction for fast computation in high dimensions with little performance loss. A novel projective cross-validation is proposed for parameter tuning in such nonconvex setups. Real data applications are given to show the power of the methodology in supervised dimension reduction and feature extraction.
TIGRESS: Trustful Inference of Gene REgulation using Stability Selection
Haury, Anne-Claire, Mordelet, Fantine, Vera-Licona, Paola, Vert, Jean-Philippe
Inferring the structure of gene regulatory networks (GRN) from gene expression data has many applications, from the elucidation of complex biological processes to the identification of potential drug targets. It is however a notoriously difficult problem, for which the many existing methods reach limited accuracy. In this paper, we formulate GRN inference as a sparse regression problem and investigate the performance of a popular feature selection method, least angle regression (LARS) combined with stability selection. We introduce a novel, robust and accurate scoring technique for stability selection, which improves the performance of feature selection with LARS. The resulting method, which we call TIGRESS (Trustful Inference of Gene REgulation using Stability Selection), was ranked among the top methods in the DREAM5 gene network reconstruction challenge. We investigate in depth the influence of the various parameters of the method and show that a fine parameter tuning can lead to significant improvements and state-of-the-art performance for GRN inference. TIGRESS reaches state-of-the-art performance on benchmark data. This study confirms the potential of feature selection techniques for GRN inference. Code and data are available on http://cbio.ensmp.fr/~ahaury. Running TIGRESS online is possible on GenePattern: http://www.broadinstitute.org/cancer/software/genepattern/.