Accuracy
Popular Machine Learning Interview Questions, part 2 - KDnuggets
This article is part 2 of my Popular Machine Learning Interview questions. Here I feature more questions I usually see asked during interviews. I shall note that this isn't an interview prep guide nor a conclusive list of all questions. Rather, you should use this article as a refresher for your Machine Learning knowledge. I suggest reading the question then try to answer it yourself before reading the answer.
Probabilistic combination of eigenlungs-based classifiers for COVID-19 diagnosis in chest CT images
Arco, Juan E., Ortiz, Andrรฉs, Ramรญrez, Javier, Martรญnez-Murcia, Francisco J., Zhang, Yu-Dong, Broncano, Jordi, Berbรญs, M. รlvaro, Royuela-del-Val, Javier, Luna, Antonio, Gรณrriz, Juan M.
The outbreak of the COVID-19 (Coronavirus disease 2019) pandemic has changed the world. According to the World Health Organization (WHO), there have been more than 100 million confirmed cases of COVID-19, including more than 2.4 million deaths. It is extremely important the early detection of the disease, and the use of medical imaging such as chest X-ray (CXR) and chest Computed Tomography (CCT) have proved to be an excellent solution. However, this process requires clinicians to do it within a manual and time-consuming task, which is not ideal when trying to speed up the diagnosis. In this work, we propose an ensemble classifier based on probabilistic Support Vector Machine (SVM) in order to identify pneumonia patterns while providing information about the reliability of the classification. Specifically, each CCT scan is divided into cubic patches and features contained in each one of them are extracted by applying kernel PCA. The use of base classifiers within an ensemble allows our system to identify the pneumonia patterns regardless of their size or location. Decisions of each individual patch are then combined into a global one according to the reliability of each individual classification: the lower the uncertainty, the higher the contribution. Performance is evaluated in a real scenario, yielding an accuracy of 97.86%. The large performance obtained and the simplicity of the system (use of deep learning in CCT images would result in a huge computational cost) evidence the applicability of our proposal in a real-world environment.
Calibrated Simplex Mapping Classification
Heese, Raoul, Walczak, Michaล, Bortz, Michael, Schmid, Jochen
In many supervised learning applications, it is not sufficient to know the most probable class y for a certain data point x. Instead, a well-calibrated probabilistic prediction p(y x) is required. For instance, in clinical applications, class probabilities are important for confidence in model predictions (Challis et al., 2015). Some classifiers intrinsically provide such a posterior probability, e. g. logistic regression or Gaussian process classification (GPC) as described in Rasmussen and Williams (2006). There are also various methods to install or improve such a calibration for a given classification approach (Niculescu-Mizil and Caruana, 2005), like Platt scaling (Platt, 2000) or isotonic regression (Zadrozny and Elkan, 2002).
Fairness in Credit Scoring: Assessment, Implementation and Profit Implications
Kozodoi, Nikita, Jacob, Johannes, Lessmann, Stefan
The rise of algorithmic decision-making has spawned much research on fair machine learning (ML). Financial institutions use ML for building risk scorecards that support a range of credit-related decisions. Yet, the literature on fair ML in credit scoring is scarce. The paper makes two contributions. First, we provide a systematic overview of algorithmic options for incorporating fairness goals in the ML model development pipeline. In this scope, we also consolidate the space of statistical fairness criteria and examine their adequacy for credit scoring. Second, we perform an empirical study of different fairness processors in a profit-oriented credit scoring setup using seven real-world data sets. The empirical results substantiate the evaluation of fairness measures, identify more and less suitable options to implement fair credit scoring, and clarify the profit-fairness trade-off in lending decisions. Specifically, we find that multiple fairness criteria can be approximately satisfied at once and identify separation as a proper criterion for measuring the fairness of a scorecard. We also find fair in-processors to deliver a good balance between profit and fairness. More generally, we show that algorithmic discrimination can be reduced to a reasonable level at a relatively low cost.
A Comparative Evaluation of Quantification Methods
Schumacher, Tobias, Strohmaier, Markus, Lemmerich, Florian
Quantification represents the problem of predicting class distributions in a given target set. It also represents a growing research field in supervised machine learning, for which a large variety of different algorithms has been proposed in recent years. However, a comprehensive empirical comparison of quantification methods that supports algorithm selection is not available yet. In this work, we close this research gap by conducting a thorough empirical performance comparison of 24 different quantification methods. To consider a broad range of different scenarios for binary as well as multiclass quantification settings, we carried out almost 3 million experimental runs on 40 data sets. We observe that no single algorithm generally outperforms all competitors, but identify a group of methods including the Median Sweep and the DyS framework that perform significantly better in binary settings. For the multiclass setting, we observe that a different, broad group of algorithms yields good performance, including the Generalized Probabilistic Adjusted Count, the readme method, the energy distance minimization method, the EM algorithm for quantification, and Friedman's method. More generally, we find that the performance on multiclass quantification is inferior to the results obtained in the binary setting. Our results can guide practitioners who intend to apply quantification algorithms and help researchers to identify opportunities for future research.
Bad and good errors: value-weighted skill scores in deep ensemble learning
Guastavino, Sabrina, Piana, Michele, Benvenuto, Federico
In this paper we propose a novel approach to realize forecast verification. Specifically, we introduce a strategy for assessing the severity of forecast errors based on the evidence that, on the one hand, a false alarm just anticipating an occurring event is better than one in the middle of consecutive non-occurring events, and that, on the other hand, a miss of an isolated event has a worse impact than a miss of a single event, which is part of several consecutive occurrences. Relying on this idea, we introduce a novel definition of confusion matrix and skill scores giving greater importance to the value of the prediction rather than to its quality. Then, we introduce a deep ensemble learning procedure for binary classification, in which the probabilistic outcomes of a neural network are clustered via optimization of these value-weighted skill scores. We finally show the performances of this approach in the case of three applications concerned with pollution, space weather and stock prize forecasting.
Label-Imbalanced and Group-Sensitive Classification under Overparameterization
Kini, Ganesh Ramachandra, Paraskevas, Orestis, Oymak, Samet, Thrampoulidis, Christos
Label-imbalanced and group-sensitive classification seeks to appropriately modify standard training algorithms to optimize relevant metrics such as balanced error and/or equal opportunity. For label imbalances, recent works have proposed a logit-adjusted loss modification to standard empirical risk minimization. We show that this might be ineffective in general and, in particular so, in the overparameterized regime where training continues in the zero training-error regime. Specifically for binary linear classification of a separable dataset, we show that the modified loss converges to the max-margin SVM classifier despite the logit adjustment. Instead, we propose a more general vector-scaling loss that directly relates to the cost-sensitive SVM (CS-SVM), thus favoring larger margin to the minority class. Through an insightful sharp asymptotic analysis for a Gaussian-mixtures data model, we demonstrate the efficacy of CS-SVM in balancing the errors of the minority/majority classes. Our analysis also leads to a simple strategy for optimally tuning the involved margin-ratio parameter. Then, we show how our results extend naturally to binary classification with sensitive groups, thus treating the two common types of imbalances (label/group) in a unifying way. We corroborate our theoretical findings with numerical experiments on both synthetic and real-world datasets.
Applications of Artificial Intelligence for Retinopathy of Prematurity Screening - Docwire News
OBJECTIVES: Childhood blindness from retinopathy of prematurity (ROP) is increasing as a result of improvements in neonatal care worldwide. We evaluate the effectiveness of artificial intelligence (AI)-based screening in an Indian ROP telemedicine program and whether differences in ROP severity between neonatal care units (NCUs) identified by using AI are related to differences in oxygen-titrating capability. All images were assigned an ROP severity score (1-9) by using the Imaging and Informatics in Retinopathy of Prematurity Deep Learning system. We calculated the area under the receiver operating characteristic curve and sensitivity and specificity for treatment-requiring retinopathy of prematurity. Using multivariable linear regression, we evaluated the mean and median ROP severity in each NCU as a function of mean birth weight, gestational age, and the presence of oxygen blenders and pulse oxygenation monitors.
Predicting Credit Card Defaults with Machine Learning
Sometimes the best model is the simplest. The model with minimal manipulation yielded the highest recall score of 0.95. After feature selection and hyperparameter tuning, recall decreased to 0.79. Overfitting means the model is strong at predicting the data on which it was trained, but weak at generalizing to unseen data. The validation score is similar to the test score, so we know it's performing similarly on completely unseen.