Accuracy
Performance Metrics for Classification problems in Machine Learning
After doing the usual Feature Engineering, Selection, and of course, implementing a model and getting some output in forms of a probability or a class, the next step is to find out how effective is the model based on some metric using test datasets. Different performance metrics are used to evaluate different Machine Learning Algorithms. For now, we will be focusing on the ones used for Classification problems. We can use classification performance metrics such as Log-Loss, Accuracy, AUC(Area under Curve) etc. Another example of metric for evaluation of machine learning algorithms is precision, recall, which can be used for sorting algorithms primarily used by search engines. The metrics that you choose to evaluate your machine learning model is very important.
On Deep Neural Networks for Detecting Heart Disease
Tomov, Nathalie-Sofia, Tomov, Stanimire
Heart disease is the leading cause of death, and experts estimate that approximately half of all heart attacks and strokes occur in people who have not been flagged as "at risk." Thus, there is an urgent need to improve the accuracy of heart disease diagnosis. To this end, we investigate the potential of using data analysis, and in particular the design and use of deep neural networks (DNNs) for detecting heart disease based on routine clinical data. Our main contribution is the design, evaluation, and optimization of DNN architectures of increasing depth for heart disease diagnosis. This work led to the discovery of a novel five layer DNN architecture - named Heart Evaluation for Algorithmic Risk-reduction and Optimization Five (HEARO-5) -- that yields best prediction accuracy. HEARO-5's design employs regularization optimization and automatically deals with missing data and/or data outliers. To evaluate and tune the architectures we use k-way cross-validation as well as Matthews correlation coefficient (MCC) to measure the quality of our classifications. The study is performed on the publicly available Cleveland dataset of medical information, and we are making our developments open source, to further facilitate openness and research on the use of DNNs in medicine. The HEARO-5 architecture, yielding 99% accuracy and 0.98 MCC, significantly outperforms currently published research in the area.
Out-of-Distribution Detection using Multiple Semantic Label Representations
Shalev, Gabi, Adi, Yossi, Keshet, Joseph
Deep Neural Networks are powerful models that attained remarkable results on a variety of tasks. These models are shown to be extremely efficient when training and test data are drawn from the same distribution. However, it is not clear how a network will act when it is fed with an out-of-distribution example. In this work, we consider the problem of out-of-distribution detection in neural networks. We propose to use multiple semantic dense representations instead of sparse representation as the target label. Specifically, we propose to use several word representations obtained from different corpora or architectures as target labels. We evaluated the proposed model on computer vision, and speech commands detection tasks and compared it to previous methods. Results suggest that our method compares favorably with previous work. Besides, we present the efficiency of our approach for detecting wrongly classified and adversarial examples.
Reproducible evaluation of classification methods in Alzheimer's disease: framework and application to MRI and PET data
Samper-González, Jorge, Burgos, Ninon, Bottani, Simona, Fontanella, Sabrina, Lu, Pascal, Marcoux, Arnaud, Routier, Alexandre, Guillon, Jérémy, Bacci, Michael, Wen, Junhao, Bertrand, Anne, Bertin, Hugo, Habert, Marie-Odile, Durrleman, Stanley, Evgeniou, Theodoros, Colliot, Olivier, Initiative, for the Alzheimer's Disease Neuroimaging, Biomarkers, the Australian Imaging, ageing, Lifestyle flagship study of
A large number of papers have introduced novel machine learning and feature extraction methods for automatic classification of AD. However, they are difficult to reproduce because key components of the validation are often not readily available. These components include selected participants and input data, image preprocessing and cross-validation procedures. The performance of the different approaches is also difficult to compare objectively. In particular, it is often difficult to assess which part of the method provides a real improvement, if any. We propose a framework for reproducible and objective classification experiments in AD using three publicly available datasets (ADNI, AIBL and OASIS). The framework comprises: i) automatic conversion of the three datasets into BIDS format, ii) a modular set of preprocessing pipelines, feature extraction and classification methods, together with an evaluation framework, that provide a baseline for benchmarking the different components. We demonstrate the use of the framework for a large-scale evaluation on 1960 participants using T1 MRI and FDG PET data. In this evaluation, we assess the influence of different modalities, preprocessing, feature types, classifiers, training set sizes and datasets. Performances were in line with the state-of-the-art. FDG PET outperformed T1 MRI for all classification tasks. No difference in performance was found for the use of different atlases, image smoothing, partial volume correction of FDG PET images, or feature type. Linear SVM and L2-logistic regression resulted in similar performance and both outperformed random forests. The classification performance increased along with the number of subjects used for training. Classifiers trained on ADNI generalized well to AIBL and OASIS. All the code of the framework and the experiments is publicly available at: https://gitlab.icm-institute.org/aramislab/AD-ML.
Siamese Survival Analysis with Competing Risks
Nemchenko, Anton, Kyono, Trent, Van Der Schaar, Mihaela
Survival analysis in the presence of multiple possible adverse events, i.e., competing risks, is a pervasive problem in many industries (healthcare, finance, etc.). Since only one event is typically observed, the incidence of an event of interest is often obscured by other related competing events. This nonidentifiability, or inability to estimate true cause-specific survival curves from empirical data, further complicates competing risk survival analysis. We introduce Siamese Survival Prognosis Network (SSPN), a novel deep learning architecture for estimating personalized risk scores in the presence of competing risks. SSPN circumvents the nonidentifiability problem by avoiding the estimation of cause-specific survival curves and instead determines pairwise concordant time-dependent risks, where longer event times are assigned lower risks. Furthermore, SSPN is able to directly optimize an approximation to the C-discrimination index, rather than relying on well-known metrics which are unable to capture the unique requirements of survival analysis with competing risks.
Mitigation of Adversarial Attacks through Embedded Feature Selection
Bao, Ziyi, Muñoz-González, Luis, Lupu, Emil C.
Machine learning has become one of the main components for task automation in many application domains. Despite the advancements and impressive achievements of machine learning, it has been shown that learning algorithms can be compromised by attackers both at training and test time. Machine learning systems are especially vulnerable to adversarial examples where small perturbations added to the original data points can produce incorrect or unexpected outputs in the learning algorithms at test time. Mitigation of these attacks is hard as adversarial examples are difficult to detect. Existing related work states that the security of machine learning systems against adversarial examples can be weakened when feature selection is applied to reduce the systems' complexity. In this paper, we empirically disprove this idea, showing that the relative distortion that the attacker has to introduce to succeed in the attack is greater when the target is using a reduced set of features. We also show that the minimal adversarial examples differ statistically more strongly from genuine examples with a lower number of features. However, reducing the feature count can negatively impact the system's performance. We illustrate the trade-off between security and accuracy with specific examples. We propose a design methodology to evaluate the security of machine learning classifiers with embedded feature selection against adversarial examples crafted using different attack strategies.
Shedding Light on Black Box Machine Learning Algorithms: Development of an Axiomatic Framework to Assess the Quality of Methods that Explain Individual Predictions
From self-driving vehicles and back-flipping robots to virtual assistants who book our next appointment at the hair salon or at that restaurant for dinner - machine learning systems are becoming increasingly ubiquitous. The main reason for this is that these methods boast remarkable predictive capabilities. However, most of these models remain black boxes, meaning that it is very challenging for humans to follow and understand their intricate inner workings. Consequently, interpretability has suffered under this ever-increasing complexity of machine learning models. Especially with regards to new regulations, such as the General Data Protection Regulation (GDPR), the necessity for plausibility and verifiability of predictions made by these black boxes is indispensable. Driven by the needs of industry and practice, the research community has recognised this interpretability problem and focussed on developing a growing number of so-called explanation methods over the past few years. These methods explain individual predictions made by black box machine learning models and help to recover some of the lost interpretability. With the proliferation of these explanation methods, it is, however, often unclear, which explanation method offers a higher explanation quality, or is generally better-suited for the situation at hand. In this thesis, we thus propose an axiomatic framework, which allows comparing the quality of different explanation methods amongst each other. Through experimental validation, we find that the developed framework is useful to assess the explanation quality of different explanation methods and reach conclusions that are consistent with independent research.
Machine Learning Interview Questions - Part 1 (Core Machine Learning) - CloudxLab Blog
For hiring machine learning engineers or data scientist, the typical process has multiple rounds. A typical first round of interview consists of three parts. A typical interviewer will start by asking about the relevant work from your profile. On your past experience of machine learning project, the interviewer might ask how would you improve it. Afterwards (third part), the interviewer would proceed to check your basic knowledge of machine learning on the following lines.
An Overview and a Benchmark of Active Learning for One-Class Classification
Trittenbach, Holger, Englhardt, Adrian, Böhm, Klemens
Active learning stands for methods which increase classification quality by means of user feedback. An important subcategory is active learning for one-class classifiers, i.e., for imbalanced class distributions. While various methods in this category exist, selecting one for a given application scenario is difficult. This is because existing methods rely on different assumptions, have different objectives, and often are tailored to a specific use case. All this calls for a comprehensive comparison, the topic of this article. This article starts with a categorization of the various methods. We then propose ways to evaluate active learning results. Next, we run extensive experiments to compare existing methods, for a broad variety of scenarios. One result is that the practicality and the performance of an active learning method strongly depend on its category and on the assumptions behind it. Another observation is that there only is a small subset of our experiments where existing approaches outperform random baselines. Finally, we show that a well-laid-out categorization and a rigorous specification of assumptions can facilitate the selection of a good method for one-class classification.
predictSLUMS: A new model for identifying and predicting informal settlements and slums in cities from street intersections using machine learning
Ibrahim, Mohamed R., Titheridge, Helena, Cheng, Tao, Haworth, James
Identifying current and future informal regions within cities remains a crucial issue for policymakers and governments in developing countries. The delineation process of identifying such regions in cities requires a lot of resources. While there are various studies that identify informal settlements based on satellite image classification, relying on both supervised or unsupervised machine learning approaches, these models either require multiple input data to function or need further development with regards to precision. In this paper, we introduce a novel method for identifying and predicting informal settlements using only street intersections data, regardless of the variation of urban form, number of floors, materials used for construction or street width. With such minimal input data, we attempt to provide planners and policy-makers with a pragmatic tool that can aid in identifying informal zones in cities. The algorithm of the model is based on spatial statistics and a machine learning approach, using Multinomial Logistic Regression (MNL) and Artificial Neural Networks (ANN). The proposed model relies on defining informal settlements based on two ubiquitous characteristics that these regions tend to be filled in with smaller subdivided lots of housing relative to the formal areas within the local context, and the paucity of services and infrastructure within the boundary of these settlements that require relatively bigger lots. We applied the model in five major cities in Egypt and India that have spatial structures in which informality is present. These cities are Greater Cairo, Alexandria, Hurghada and Minya in Egypt, and Mumbai in India. The predictSLUMS model shows high validity and accuracy for identifying and predicting informality within the same city the model was trained on or in different ones of a similar context.