Performance Analysis
On Breast Cancer Detection: An Application of Machine Learning Algorithms on the Wisconsin Diagnostic Dataset
This paper presents a comparison of six machine learning (ML) algorithms: GRU-SVM (Agarap, 2017), Linear Regression, Multilayer Perceptron (MLP), Nearest Neighbor (NN) search, Softmax Regression, and Support Vector Machine (SVM) on the Wisconsin Diagnostic Breast Cancer (WDBC) dataset (Wolberg, Street, & Mangasarian, 1992) by measuring their classification test accuracy and their sensitivity and specificity values. The said dataset consists of features which were computed from digitized images of FNA tests on a breast mass (Wolberg, Street, & Mangasarian, 1992). For the implementation of the ML algorithms, the dataset was partitioned in the following fashion: 70% for training phase, and 30% for the testing phase. The hyper-parameters used for all the classifiers were manually assigned. Results show that all the presented ML algorithms performed well (all exceeded 90% test accuracy) on the classification task. The MLP algorithm stands out among the implemented algorithms with a test accuracy of ~99.04%.
Tensorial and bipartite block models for link prediction in layered networks and temporal networks
Tarres-Deulofeu, Marc, Godoy-Lorite, Antonia, Guimera, Roger, Sales-Pardo, Marta
Imagine a team of researchers looking for promising drug combinations to treat a specific cancer type for which current treatments are ineffective. The team has data on the effect of certain pairs of drugs on other cancer types, but the data are very sparse--only a few drug pairs have been tested on each cancer type, and each drug pair is tested in a few cancer types, at best, or has never been tested at all. The challenge is to select the most promising drug pairs for testing with the target cancer type, so as to minimize the cost associated to unsuccessful tests. We can formalize this challenge as the following inference problem: We have a partial observation of the pairwise interactions between a set of nodes (drugs) in different "network layers" (cancer types), and we need to infer which are the unobserved interactions within each layer (drug interactions in each cancer type). This challenge is relevant for the many systems that can be represented as multilayer networks [1-4], and is also formally analogous to the challenge of predicting the existence of interactions between nodes in time-resolved networks [5-11]. For instance, we would face the same situation if we had data about the daily email or phone communications between users, and wanted to infer the existence of interactions between pairs of users on a certain unobserved day; in this case each layer would be a different day. Here, we introduce new generative models that are suitable to address the challenge above. We model all layers concurrently, so that our approach takes full advantage of the information contained in all layers to make predictions for any one of them.
A Comparative Study of Pairwise Learning Methods based on Kernel Ridge Regression
Stock, Michiel, Pahikkala, Tapio, Airola, Antti, De Baets, Bernard, Waegeman, Willem
Many machine learning problems can be formulated as predicting labels for a pair of objects. Problems of that kind are often referred to as pairwise learning, dyadic prediction or network inference problems. During the last decade kernel methods have played a dominant role in pairwise learning. They still obtain a state-of-the-art predictive performance, but a theoretical analysis of their behavior has been underexplored in the machine learning literature. In this work we review and unify existing kernel-based algorithms that are commonly used in different pairwise learning settings, ranging from matrix filtering to zero-shot learning. To this end, we focus on closed-form efficient instantiations of Kronecker kernel ridge regression. We show that independent task kernel ridge regression, two-step kernel ridge regression and a linear matrix filter arise naturally as a special case of Kronecker kernel ridge regression, implying that all these methods implicitly minimize a squared loss. In addition, we analyze universality, consistency and spectral filtering properties. Our theoretical results provide valuable insights in assessing the advantages and limitations of existing pairwise learning methods.
Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates
Yin, Dong, Chen, Yudong, Ramchandran, Kannan, Bartlett, Peter
In large-scale distributed learning, security issues have become increasingly important. Particularly in a decentralized environment, some computing units may behave abnormally, or even exhibit Byzantine failures---arbitrary and potentially adversarial behavior. In this paper, we develop distributed learning algorithms that are provably robust against such failures, with a focus on achieving optimal statistical performance. A main result of this work is a sharp analysis of two robust distributed gradient descent algorithms based on median and trimmed mean operations, respectively. We prove statistical error rates for three kinds of population loss functions: strongly convex, non-strongly convex, and smooth non-convex. In particular, these algorithms are shown to achieve order-optimal statistical error rates for strongly convex losses. To achieve better communication efficiency, we further propose a median-based distributed algorithm that is provably robust, and uses only one communication round. For strongly convex quadratic loss, we show that this algorithm achieves the same optimal error rate as the robust distributed gradient descent algorithms.
Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms
Hazimeh, Hussein, Mazumder, Rahul
We consider the canonical $L_0$-regularized least squares problem (aka best subsets) which is generally perceived as a `gold-standard' for many sparse learning regimes. In spite of worst-case computational intractability results, recent work has shown that advances in mixed integer optimization can be used to obtain near-optimal solutions to this problem for instances where the number of features $p \approx 10^3$. While these methods lead to estimators with excellent statistical properties, often there is a price to pay in terms of a steep increase in computation times, especially when compared to highly efficient popular algorithms for sparse learning (e.g., based on $L_1$-regularization) that scale to much larger problem sizes. Bridging this gap is a main goal of this paper. We study the computational aspects of a family of $L_0$-regularized least squares problems with additional convex penalties. We propose a hierarchy of necessary optimality conditions for these problems. We develop new algorithms, based on coordinate descent and local combinatorial optimization schemes, and study their convergence properties. We demonstrate that the choice of an algorithm determines the quality of solutions obtained; and local combinatorial optimization-based algorithms generally result in solutions of superior quality. We show empirically that our proposed framework is relatively fast for problem instances with $p\approx 10^6$ and works well, in terms of both optimization and statistical properties (e.g., prediction, estimation, and variable selection), compared to simpler heuristic algorithms. A version of our algorithm reaches up to a three-fold speedup (with $p$ up to $10^6$) when compared to state-of-the-art schemes for sparse learning such as glmnet and ncvreg.
Support Vector Machine Simplified using R
There is no thumb rule of choosing the best kernel. The only solution is Cross-validation. Try several different Kernels, and evaluate their performance metrics such as AUC and select the one with highest AUC. If you want to compare in terms of speed, linear kernels usually compute much faster than radial or polynomial kernels.
Clinically Meaningful Comparisons Over Time: An Approach to Measuring Patient Similarity based on Subsequence Alignment
Goyal, Dev, Syed, Zeeshan, Wiens, Jenna
Longitudinal patient data has the potential to improve clinical risk stratification models for disease. However, chronic diseases that progress slowly over time are often heterogeneous in their clinical presentation. Patients may progress through disease stages at varying rates. This leads to pathophysiological misalignment over time, making it difficult to consistently compare patients in a clinically meaningful way. Furthermore, patients present clinically for the first time at different stages of disease. This eliminates the possibility of simply aligning patients based on their initial presentation. Finally, patient data may be sampled at different rates due to differences in schedules or missed visits. To address these challenges, we propose a robust measure of patient similarity based on subsequence alignment. Compared to global alignment techniques that do not account for pathophysiological misalignment, focusing on the most relevant subsequences allows for an accurate measure of similarity between patients. We demonstrate the utility of our approach in settings where longitudinal data, while useful, are limited and lack a clear temporal alignment for comparison. Applied to the task of stratifying patients for risk of progression to probable Alzheimer's Disease, our approach outperforms models that use only snapshot data (AUROC of 0.839 vs. 0.812) and models that use global alignment techniques (AUROC of 0.822). Our results support the hypothesis that patients' trajectories are useful for quantifying inter-patient similarities and that using subsequence matching and can help account for heterogeneity and misalignment in longitudinal data.
A brain signature highly predictive of future progression to Alzheimer's dementia
Dansereau, Christian, Tam, Angela, Badhwar, AmanPreet, Urchs, Sebastian, Orban, Pierre, Rosa-Neto, Pedro, Bellec, Pierre
Early prognosis of Alzheimer's dementia is hard. Mild cognitive impairment (MCI) typically precedes Alzheimer's dementia, yet only a fraction of MCI individuals will progress to dementia, even when screened using biomarkers. We propose here to identify a subset of individuals who share a common brain signature highly predictive of oncoming dementia. This signature was composed of brain atrophy and functional dysconnectivity and discovered using a machine learning model in patients suffering from dementia. The model recognized the same brain signature in MCI individuals, 90% of which progressed to dementia within three years. This result is a marked improvement on the state-of-theart in prognostic precision, while the brain signature still identified 47% of all MCI progressors. We thus discovered a sizable MCI subpopulation which represents an excellent recruitment target for clinical trials at the prodromal stage of Alzheimer's disease. Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. Acknowledgement_List.pdf Preprint submitted to March 5, 2018 1. Introduction Alzheimer's disease (AD) is the most common age-related neurodegenerative disorder. The typical progression of late-onset, sporadic AD comprises a lengthy preclinical stage, a prodromal stage of mild cognitive impairment (MCI), and a final stage of dementia. Usually, by the time patients suffer from dementia, severe and irreversible neurodegeneration has already occurred.
Metrics to Evaluate your Machine Learning Algorithm
Evaluating your machine learning algorithm is an essential part of any project. Your model may give you satisfying results when evaluated using a metric say accuracy_score but may give poor results when evaluated against other metrics such as logarithmic_loss or any other such metric. Most of the times we use classification accuracy to measure the performance of our model, however it is not enough to truly judge our model. In this post, we will cover different types of evaluation metrics available. Classification Accuracy is what we usually mean, when we use the term accuracy.
Modeling reverse thinking for machine learning
Human inertial thinking schemes can be formed through learning, which are then applied to quickly solve similar problems later. However, when problems are significantly different, inertial thinking generally presents the solutions that are definitely imperfect. In such cases, people will apply creative thinking, such as reverse thinking, to solve problems. Similarly, machine learning methods also form inertial thinking schemes through learning the knowledge from a large amount of data. However, when the testing data are vastly difference, the formed inertial thinking schemes will inevitably generate errors. This kind of inertial thinking is called illusion inertial thinking. Because all machine learning methods do not consider illusion inertial thinking, in this paper we propose a new method that uses reverse thinking to correct illusion inertial thinking, which increases the generalization ability of machine learning methods. Experimental results on benchmark datasets are used to validate the proposed method.