Goto

Collaborating Authors

 Performance Analysis


Multimodal Sparse Classifier for Adolescent Brain Age Prediction

arXiv.org Machine Learning

The study of healthy brain development helps to better understand the brain transformation and brain connectivity patterns which happen during childhood to adulthood. This study presents a sparse machine learning solution across whole-brain functional connectivity (FC) measures of three sets of data, derived from resting state functional magnetic resonance imaging (rs-fMRI) and task fMRI data, including a working memory n-back task (nb-fMRI) and an emotion identification task (em-fMRI). These multi-modal image data are collected on a sample of adolescents from the Philadelphia Neurodevelopmental Cohort (PNC) for the prediction of brain ages. Due to extremely large variable-to-instance ratio of PNC data, a high dimensional matrix with several irrelevant and highly correlated features is generated and hence a pattern learning approach is necessary to extract significant features. We propose a sparse learner based on the residual errors along the estimation of an inverse problem for the extreme learning machine (ELM) neural network. The purpose of the approach is to overcome the overlearning problem through pruning of several redundant features and their corresponding output weights. The proposed multimodal sparse ELM classifier based on residual errors (RES-ELM) is highly competitive in terms of the classification accuracy compared to its counterparts such as conventional ELM, and sparse Bayesian learning ELM.


Unsupervised Contextual Anomaly Detection using Joint Deep Variational Generative Models

arXiv.org Machine Learning

Often these processes result in highly dimensional data sets, with complex relationships within the data and exhibit stochastic behavior. Furthermore the anomalies by definition contain high self-information measure and therefore carry useful information about the underlying data generation process. There exist a number of similar definitions of what an anomaly is however in this paper the following definition is adopted [11]: 1. Anomalies are different from the norm in respect to their attributes.


Summarizing Event Sequences with Serial Episodes: A Statistical Model and an Application

arXiv.org Machine Learning

In this paper we address the problem of discovering a small set of frequent serial episodes from sequential data so as to adequately characterize or summarize the data. We discuss an algorithm based on the Minimum Description Length (MDL) principle and the algorithm is a slight modification of an earlier method, called CSC-2. We present a novel generative model for sequence data containing prominent pairs of serial episodes and, using this, provide some statistical justification for the algorithm. We believe this is the first instance of such a statistical justification for an MDL based algorithm for summarizing event sequence data. We then present a novel application of this data mining algorithm in text classification. By considering text documents as temporal sequences of words, the data mining algorithm can find a set of characteristic episodes for all the training data as a whole. The words that are part of these characteristic episodes could then be considered the only relevant words for the dictionary thus resulting in a considerably reduced feature vector dimension. We show, through simulation experiments using benchmark data sets, that the discovered frequent episodes can be used to achieve more than four-fold reduction in dictionary size without losing any classification accuracy.


Robust Subspace Recovery Layer for Unsupervised Anomaly Detection

arXiv.org Machine Learning

We propose a neural network for unsupervised anomaly detection with a novel robust subspace recovery layer (RSR layer). This layer seeks to extract the underlying subspace from a latent representation of the given data and remove outliers that lie away from this subspace. It is used together with an encoder and a decoder. The encoder maps the data into the latent space, from which the RSR layer extracts the subspace. The decoder then smoothly maps back the underlying subspace to a ``manifold" close to the original data. We illustrate algorithmic choices and performance for artificial data with corrupted manifold structure. We also demonstrate competitive precision and recall for image datasets.


RAPID: Early Classification of Explosive Transients using Deep Learning

arXiv.org Machine Learning

We present RAPID (Real-time Automated Photometric IDentification), a novel time-series classification tool capable of automatically identifying transients from within a day of the initial alert, to the full lifetime of a light curve. Using a deep recurrent neural network with Gated Recurrent Units (GRUs), we present the first method specifically designed to provide early classifications of astronomical time-series data, typing 12 different transient classes. Our classifier can process light curves with any phase coverage, and it does not rely on deriving computationally expensive features from the data, making RAPID well-suited for processing the millions of alerts that ongoing and upcoming wide-field surveys such as the Zwicky Transient Facility (ZTF), and the Large Synoptic Survey Telescope (LSST) will produce. The classification accuracy improves over the lifetime of the transient as more photometric data becomes available, and across the 12 transient classes, we obtain an average area under the receiver operating characteristic curve of 0.95 and 0.98 at early and late epochs, respectively. We demonstrate RAPID's ability to effectively provide early classifications of transients from the ZTF data stream. We have made RAPID available as an open-source software package (https://astrorapid.readthedocs.io) for machine learning-based alert-brokers to use for the autonomous and quick classification of several thousand light curves within a few seconds.


Infinite Brain MR Images: PGGAN-based Data Augmentation for Tumor Detection

arXiv.org Artificial Intelligence

Due to the lack of available annotated medical images, accurate computer-assisted diagnosis requires intensive Data Augmentation (DA) techniques, such as geometric/intensity transformations of original images; however, those transformed images intrinsically have a similar distribution to the original ones, leading to limited performance improvement. To fill the data lack in the real image distribution, we synthesize brain contrast-enhanced Magnetic Resonance (MR) images---realistic but completely different from the original ones---using Generative Adversarial Networks (GANs). This study exploits Progressive Growing of GANs (PGGANs), a multi-stage generative training method, to generate original-sized 256 X 256 MR images for Convolutional Neural Network-based brain tumor detection, which is challenging via conventional GANs; difficulties arise due to unstable GAN training with high resolution and a variety of tumors in size, location, shape, and contrast. Our preliminary results show that this novel PGGAN-based DA method can achieve promising performance improvement, when combined with classical DA, in tumor detection and also in other medical imaging tasks.


On preserving non-discrimination when combining expert advice

arXiv.org Machine Learning

The emergence of machine learning in the last decade has given rise to an important debate regarding the ethical and societal responsibility of its offspring. Machine learning has provided a universal toolbox enhancing the decision making in many disciplines from advertising and recommender systems to education and criminal justice. Unfortunately, both the data and their processing can be biased against specific population groups (even inadvertently) in every single step of the process [BS16]. This has generated societal and policy interest in understanding the sources of this discrimination and interdisciplinary research has attempted to mitigate its shortcomings. Discrimination is commonly an issue in applications where decisions need to be made sequentially. The most prominent such application is online advertising where platforms need to sequentially select which ad to display in response to particular query searches. This process can introduce discrimination against protected groups in many ways such as filtering particular alternatives [DTD15, APJ16] and reinforcing existing stereotypes through search results [Swe13, KMM15]. Another canonical example of sequential decision making is medical trials where underexploration on female groups often leads to significantly worse treatments for them [LDM16]. Similar issues occur in image classification as stressed by "gender shades" [BG18].


Interpreting Black Box Models with Statistical Guarantees

arXiv.org Machine Learning

While many methods for interpreting machine learning models have been proposed, they are frequently ad hoc, difficult to evaluate, and come with no statistical guarantees on the error rate. This is especially problematic in scientific domains, where interpretations must be accurate and reliable. In this paper, we cast black box model interpretation as a hypothesis testing problem. The task is to discover "important" features by testing whether the model prediction is significantly different from what would be expected if the features were replaced with randomly-sampled counterfactuals. We derive a multiple hypothesis testing framework for finding important features that enables control over the false discovery rate. We propose two testing methods, as well as analogs of one-sided and two-sided tests. In simulation, the methods have high power and compare favorably against existing interpretability methods. When applied to vision and language models, the framework selects features that intuitively explain model predictions.


The False Positive Control Lasso

arXiv.org Machine Learning

In high dimensional settings where a small number of regressors are expected to be important, the Lasso estimator can be used to obtain a sparse solution vector with the expectation that most of the non-zero coefficients are associated with true signals. While several approaches have been developed to control the inclusion of false predictors with the Lasso, these approaches are limited by relying on asymptotic theory, having to empirically estimate terms based on theoretical quantities, assuming a continuous response class with Gaussian noise and design matrices, or high computation costs. In this paper we show how: (1) an existing model (the SQRT-Lasso) can be recast as a method of controlling the number of expected false positives, (2) how a similar estimator can used for all other generalized linear model classes, and (3) this approach can be fit with existing fast Lasso optimization solvers. Our justification for false positive control using randomly weighted self-normalized sum theory is to our knowledge novel. Moreover, our estimator's properties hold in finite samples up to some approximation error which we find in practical settings to be negligible under a strict mutual incoherence condition.


Using Latent Class Analysis to Identify ARDS Sub-phenotypes for Enhanced Machine Learning Predictive Performance

arXiv.org Machine Learning

In this work, we utilize Machine Learning for early recognition of patients at high risk of acute respiratory distress syndrome (ARDS), which is critical for successful prevention strategies for this devastating syndrome. The difficulty in early ARDS recognition stems from its complex and heterogenous nature. In this study, we integrate knowledge of the heterogeneity of ARDS patients into predictive model building. Using MIMIC-III data, we first apply latent class analysis (LCA) to identify homogeneous sub-groups in the ARDS population, and then build predictive models on the partitioned data. The results indicate that significantly improved performances of prediction can be obtained for two of the three identified sub-phenotypes of ARDS. Experiments suggests that identifying sub-phenotypes is beneficial for building predictive model for ARDS.