Goto

Collaborating Authors

 Performance Analysis


AI reduces false positives in screening mammography

#artificialintelligence

Following the assumption that there may be nuanced features associated with some mammogram images that could lead to an unnecessary recall when interpreted by a radiologist, the researchers used a method based on convolutional neural networks (CNNs) to build a computer toolkit that could identify those images. The researchers trained CNN models using 14,860 images of 3,715 patients from the Full-Field Digital Mammography Dataset and the Digital Dataset of Screening Mammography. They investigated six classification scenarios that would help distinguish images of benign, malignant, and recalled-benign mammograms.


Taking Advantage of Multitask Learning for Fair Classification

arXiv.org Machine Learning

A central goal of algorithmic fairness is to reduce bias in automated decision making. An unavoidable tension exists between accuracy gains obtained by using sensitive information (e.g., gender or ethnic group) as part of a statistical model, and any commitment to protect these characteristics. Often, due to biases present in the data, using the sensitive information in the functional form of a classifier improves classification accuracy. In this paper we show how it is possible to get the best of both worlds: optimize model accuracy and fairness without explicitly using the sensitive feature in the functional form of the model, thereby treating different individuals equally. Our method is based on two key ideas. On the one hand, we propose to use Multitask Learning (MTL), enhanced with fairness constraints, to jointly learn group specific classifiers that leverage information between sensitive groups. On the other hand, since learning group specific models might not be permitted, we propose to first predict the sensitive features by any learning method and then to use the predicted sensitive feature to train MTL with fairness constraints. This enables us to tackle fairness with a three-pronged approach, that is, by increasing accuracy on each group, enforcing measures of fairness during training, and protecting sensitive information during testing. Experimental results on two real datasets support our proposal, showing substantial improvements in both accuracy and fairness.


Population and Empirical PR Curves for Assessment of Ranking Algorithms

arXiv.org Machine Learning

The precision-recall (PR) curve has become the de facto replacement for the ROC curve in the presence of imbalance, namely where one class is far more likely than the other class. While the PR and ROC curves tend to be used interchangeably, they have some very different properties. Properties of the PR curve are the focus of this paper. We consider: (1) population PR curves, where complete distributional assumptions are specified for scores from both classes; and (2) empirical estimators of the PR curve, where we observe scores and no distributional assumptions are made. The properties have direct consequence on how the PR curve should, and should not, be used. For example, the empirical PR curve is not consistent when scores in the class of primary interest come from discrete distributions. On the other hand, a normal approximation can fit quite well for points on the empirical PR curve from continuously-defined scores, but convergence can be heavily influenced by the distributional setting, the amount of imbalance, and the point of interest on the PR curve.


Malicious Web Domain Identification using Online Credibility and Performance Data by Considering the Class Imbalance Issue

arXiv.org Machine Learning

Purpose: Malicious web domain identification is of significant importance to the security protection of Internet users. With online credibility and performance data, this paper aims to investigate the use of machine learning tech-niques for malicious web domain identification by considering the class imbalance issue (i.e., there are more benign web domains than malicious ones). Design/methodology/approach: We propose an integrated resampling approach to handle class imbalance by combining the Synthetic Minority Over-sampling TEchnique (SMOTE) and Particle Swarm Optimisation (PSO), a population-based meta-heuristic algorithm. We use the SMOTE for over-sampling and PSO for under-sampling. Findings: By applying eight well-known machine learning classifiers, the proposed integrated resampling approach is comprehensively examined using several imbalanced web domain datasets with different imbalance ratios. Com-pared to five other well-known resampling approaches, experimental results confirm that the proposed approach is highly effective. Practical implications: This study not only inspires the practical use of online credibility and performance data for identifying malicious web domains, but also provides an effective resampling approach for handling the class imbal-ance issue in the area of malicious web domain identification. Originality/value: Online credibility and performance data is applied to build malicious web domain identification models using machine learning techniques. An integrated resampling approach is proposed to address the class im-balance issue. The performance of the proposed approach is confirmed based on real-world datasets with different imbalance ratios.


Variational Noise-Contrastive Estimation

arXiv.org Machine Learning

Unnormalised latent variable models are a broad and flexible class of statistical models. However, learning their parameters from data is intractable, and few estimation techniques are currently available for such models. To increase the number of techniques in our arsenal, we propose variational noise-contrastive estimation (VNCE), building on NCE which is a method that only applies to unnormalised models. The core idea is to use a variational lower bound to the NCE objective function, which can be optimised in the same fashion as the evidence lower bound (ELBO) in standard variational inference (VI). We prove that VNCE can be used for both parameter estimation of unnormalised models and posterior inference of latent variables. The developed theory shows that VNCE has the same level of generality as standard VI, meaning that advances made there can be directly imported to the unnormalised setting. We validate VNCE on toy models and apply it to a realistic problem of estimating an undirected graphical model from incomplete data.


Mobile Sound Recognition for the Deaf and Hard of Hearing

arXiv.org Artificial Intelligence

Human perception of surrounding events is strongly dependent on audio cues. Thus, acoustic insulation can seriously impact situational awareness. We present an exploratory study in the domain of assistive computing, eliciting requirements and presenting solutions to problems found in the development of an environmental sound recognition system, which aims to assist deaf and hard of hearing people in the perception of sounds. To take advantage of smartphones computational ubiquity, we propose a system that executes all processing on the device itself, from audio features extraction to recognition and visual presentation of results. Our application also presents the confidence level of the classification to the user. A test of the system conducted with deaf users provided important and inspiring feedback from participants.


Fairness for Whom? Critically reframing fairness with Nash Welfare Product

arXiv.org Artificial Intelligence

Recent studies on disparate impact in machine learning applications have sparked a debate around the concept of fairness along with attempts to formalize its different criteria. Many of these approaches focus on reducing prediction errors while maximizing sole utility of the institution. This work seeks to reconceptualize and critically frame the existing discourse on fairness by underlining the implicit biases embedded in common understandings of fairness in the literature and how they contrast with its corresponding economic and legal definitions. This paper expands the concept of utility and fairness by bringing in concepts from established literature in welfare economics and game theory. We then translate these concepts for the algorithmic prediction domain by defining a formalization of Nash Welfare Product that seeks to expand utility by collapsing that of the institution using the prediction tool and the individual subject to the prediction into one function. We then apply a modulating function that makes the fairness and welfare trade-offs explicit based on designated policy goals and then apply it to a temporal model to take into account the effects of decisions beyond the scope of one-shot predictions. We apply this on a binary classification problem and present results of a multi-epoch simulation based on the UCI Adult Income dataset and a test case analysis of the ProPublica recidivism dataset that show that expanding the concept of utility results in a fairer distribution correcting for the embedded biases in the dataset without sacrificing the classifier accuracy.


From Scikit-learn to TensorFlow: Part 2 โ€“ Towards Data Science

#artificialintelligence

Continuing from where we left, we delve deeper into how to develop machine learning (ML) algorithms using TensorFlow from a scikit-learn developer's perspective. If you'd like to know the reasons to move to TensorFlow, motivations, do read my earlier post for Reasons to move to TensorFlow and a simple classification program that highlights similarities of developing for scikit-learn and TensorFlow. In the earlier post, we compared the fit and predict paradigm similarities in scikit-learn and TensorFlow. In this post, I want to show we can develop a TensorFlow classification framework with Scikit-learn's data processing and reporting tools. This will give a good method to interweave both the frameworks to come up with a neat and concise framework.


HierLPR: Decision making in hierarchical multi-label classification with local precision rates

arXiv.org Machine Learning

In this article we propose a novel ranking algorithm, referred to as HierLPR, for the multi-label classification problem when the candidate labels follow a known hierarchical structure. HierLPR is motivated by a new metric called eAUC that we design to assess the ranking of classification decisions. This metric, associated with the hit curve and local precision rate, emphasizes the accuracy of the first calls. We show that HierLPR optimizes eAUC under the tree constraint and some light assumptions on the dependency between the nodes in the hierarchy. We also provide a strategy to make calls for each node based on the ordering produced by HierLPR, with the intent of controlling FDR or maximizing F-score. The performance of our proposed methods is demonstrated on synthetic datasets as well as a real example of disease diagnosis using NCBI GEO datasets. In these cases, HierLPR shows a favorable result over competing methods in the early part of the precision-recall curve.


Predicting optical coherence tomography-derived diabetic macular edema grades from fundus photographs using deep learning

arXiv.org Machine Learning

Diabetic eye disease is one of the fastest growing causes of preventable blindness. With the advent of anti-VEGF (vascular endothelial growth factor) therapies, it has become increasingly important to detect center-involved diabetic macular edema. However, center-involved diabetic macular edema is diagnosed using optical coherence tomography (OCT), which is not generally available at screening sites because of cost and workflow constraints. Instead, screening programs rely on the detection of hard exudates as a proxy for DME on color fundus photographs, often resulting in high false positive or false negative calls. To improve the accuracy of DME screening, we trained a deep learning model to use color fundus photographs to predict DME grades derived from OCT exams. Our "OCT-DME" model had an AUC of 0.89 (95% CI: 0.87-0.91), which corresponds to a sensitivity of 85% at a specificity of 80%. In comparison, three retinal specialists had similar sensitivities (82-85%), but only half the specificity (45-50%, p<0.001 for each comparison with model). The positive predictive value (PPV) of the OCT-DME model was 61% (95% CI: 56-66%), approximately double the 36-38% by the retina specialists. In addition, we used saliency and other techniques to examine how the model is making its prediction. The ability of deep learning algorithms to make clinically relevant predictions that generally require sophisticated 3D-imaging equipment from simple 2D images has broad relevance to many other applications in medical imaging.