Accuracy
An interpretable classifier for high-resolution breast cancer screening images utilizing weakly supervised localization
Shen, Yiqiu, Wu, Nan, Phang, Jason, Park, Jungkyu, Liu, Kangning, Tyagi, Sudarshini, Heacock, Laura, Kim, S. Gene, Moy, Linda, Cho, Kyunghyun, Geras, Krzysztof J.
Medical images differ from natural images in significantly higher resolutions and smaller regions of interest. Because of these differences, neural network architectures that work well for natural images might not be applicable to medical image analysis. In this work, we extend the globally-aware multiple instance classifier, a framework we proposed to address these unique properties of medical images. This model first uses a low-capacity, yet memory-efficient, network on the whole image to identify the most informative regions. It then applies another higher-capacity network to collect details from chosen regions. Finally, it employs a fusion module that aggregates global and local information to make a final prediction. While existing methods often require lesion segmentation during training, our model is trained with only image-level labels and can generate pixel-level saliency maps indicating possible malignant findings. We apply the model to screening mammography interpretation: predicting the presence or absence of benign and malignant lesions. On the NYU Breast Cancer Screening Dataset, consisting of more than one million images, our model achieves an AUC of 0.93 in classifying breasts with malignant findings, outperforming ResNet-34 and Faster R-CNN. Compared to ResNet-34, our model is 4.1x faster for inference while using 78.4% less GPU memory. Furthermore, we demonstrate, in a reader study, that our model surpasses radiologist-level AUC by a margin of 0.11. The proposed model is available online: https://github.com/nyukat/GMIC.
The use of Convolutional Neural Networks for signal-background classification in Particle Physics experiments
Ayyar, Venkitesh, Bhimji, Wahid, Gerhardt, Lisa, Robertson, Sally, Ronaghi, Zahra
The success of Convolutional Neural Networks (CNNs) in image classification has prompted efforts to study their use for classifying image data obtained in Particle Physics experiments. Here, we discuss our efforts to apply CNNs to 2D and 3D image data from particle physics experiments to classify signal from background. In this work we present an extensive convolutional neural architecture search, achieving high accuracy for signal/background discrimination for a HEP classification use-case based on simulated data from the Ice Cube neutrino observatory and an ATLAS-like detector. We demonstrate among other things that we can achieve the same accuracy as complex ResNet architectures with CNNs with less parameters, and present comparisons of computational requirements, training and inference times.
Simple Interactive Image Segmentation using Label Propagation through kNN graphs
Many interactive image segmentation techniques are based on semi-supervised learning. The user may label some pixels from each object and the SSL algorithm will propagate the labels from the labeled to the unlabeled pixels, finding object boundaries. This paper proposes a new SSL graph-based interactive image segmentation approach, using undirected and unweighted kNN graphs, from which the unlabeled nodes receive contributions from other nodes (either labeled or unlabeled). It is simpler than many other techniques, but it still achieves significant classification accuracy in the image segmentation task. Computer simulations are performed using some real-world images, extracted from the Microsoft GrabCut dataset. The segmentation results show the effectiveness of the proposed approach.
Analysis and Evaluation of Handwriting in Patients with Parkinson's Disease Using kinematic, Geometrical, and Non-linear Features
Rios-Urrego, C. D., Vรกsquez-Correa, J. C., Vargas-Bonilla, J. F., Nรถth, E., Lopera, F., Orozco-Arroyave, J. R.
Background and objectives: Parkinson's disease is a neurological disorder that affects the motor system producing lack of coordination, resting tremor, and rigidity. Impairments in handwriting are among the main symptoms of the disease. Handwriting analysis can help in supporting the diagnosis and in monitoring the progress of the disease. This paper aims to evaluate the importance of different groups of features to model handwriting deficits that appear due to Parkinson's disease; and how those features are able to discriminate between Parkinson's disease patients and healthy subjects. Methods: Features based on kinematic, geometrical and non-linear dynamics analyses were evaluated to classify Parkinson's disease and healthy subjects. Classifiers based on K-nearest neighbors, support vector machines, and random forest were considered. Results: Accuracies of up to $93.1\%$ were obtained in the classification of patients and healthy control subjects. A relevance analysis of the features indicated that those related to speed, acceleration, and pressure are the most discriminant. The automatic classification of patients in different stages of the disease shows $\kappa$ indexes between $0.36$ and $0.44$. Accuracies of up to $83.3\%$ were obtained in a different dataset used only for validation purposes. Conclusions: The results confirmed the negative impact of aging in the classification process when we considered different groups of healthy subjects. In addition, the results reported with the separate validation set comprise a step towards the development of automated tools to support the diagnosis process in clinical practice.
Keras Metrics: Everything You Need To Know
Keras metrics are functions that are used to evaluate the performance of your deep learning model. Choosing a good metric for your problem is usually a difficult task. Lucky for you, this article explains all that! In Keras, metrics are passed during the compile stage as shown below. You can pass several metrics by comma separating them.
Particle Competition and Cooperation for Semi-Supervised Learning with Label Noise
Breve, Fabricio Aparecido, Zhao, Liang, Quiles, Marcos Gonรงalves
Semi-supervised learning methods are usually employed in the classification of data sets where only a small subset of the data items is labeled. In these scenarios, label noise is a crucial issue, since the noise may easily spread to a large portion or even the entire data set, leading to major degradation in classification accuracy. Therefore, the development of new techniques to reduce the nasty effects of label noise in semi-supervised learning is a vital issue. Recently, a graph-based semi-supervised learning approach based on Particle competition and cooperation was developed. In this model, particles walk in the graphs constructed from the data sets. Competition takes place among particles representing different class labels, while the cooperation occurs among particles with the same label. This paper presents a new particle competition and cooperation algorithm, specifically designed to increase the robustness to the presence of label noise, improving its label noise tolerance. Different from other methods, the proposed one does not require a separate technique to deal with label noise. It performs classification of unlabeled nodes and reclassification of the nodes affected by label noise in a unique process. Computer simulations show the classification accuracy of the proposed method when applied to some artificial and real-world data sets, in which we introduce increasing amounts of label noise. The classification accuracy is compared to those achieved by previous particle competition and cooperation algorithms and other representative graph-based semi-supervised learning methods using the same scenarios. Results show the effectiveness of the proposed method.
Boosting rare benthic macroinvertebrates taxa identification with one-class classification
Sohrab, Fahad, Raitoharju, Jenni
Insect monitoring is crucial for understanding the consequences of rapid ecological changes, but taxa identification currently requires tedious manual expert work and cannot be scaled-up efficiently. Deep convolutional neural networks (CNNs), provide a viable way to significantly increase the biomonitoring volumes. However, taxa abundances are typically very imbalanced and the amounts of training images for the rarest classes are simply too low for deep CNNs. As a result, the samples from the rare classes are often completely missed, while detecting them has biological importance. In this paper, we propose combining the trained deep CNN with one-class classifiers to improve the rare species identification. One-class classification models are traditionally trained with much fewer samples and they can provide a mechanism to indicate samples potentially belonging to the rare classes for human inspection. Our experiments confirm that the proposed approach may indeed support moving towards partial automation of the taxa identification task.
Debugging Machine Learning Pipelines
Lourenรงo, Raoni, Freire, Juliana, Shasha, Dennis
Machine learning tasks entail the use of complex computational pipelines to reach quantitative and qualitative conclusions. If some of the activities in a pipeline produce erroneous or uninformative outputs, the pipeline may fail or produce incorrect results. Inferring the root cause of failures and unexpected behavior is challenging, usually requiring much human thought, and is both time-consuming and error-prone. We propose a new approach that makes use of iteration and provenance to automatically infer the root causes and derive succinct explanations of failures. Through a detailed experimental evaluation, we assess the cost, precision, and recall of our approach compared to the state of the art. Our source code and experimental data will be available for reproducibility and enhancement.
A Physiology-Driven Computational Model for Post-Cardiac Arrest Outcome Prediction
Kim, Han B., Nguyen, Hieu, Jin, Qingchu, Tamby, Sharmila, Romer, Tatiana Gelaf, Sung, Eric, Liu, Ran, Greenstein, Joseph, Suarez, Jose I., Storm, Christian, Winslow, Raimond, Stevens, Robert D.
Patients resuscitated from cardiac arrest (CA) face a high risk of neurological disability and death, however pragmatic methods are lacking for accurate and reliable prognostication. The aim of this study was to build computational models to predict post-CA outcome by leveraging high-dimensional patient data available early after admission to the intensive care unit (ICU). We hypothesized that model performance could be enhanced by integrating physiological time series (PTS) data and by training machine learning (ML) classifiers. We compared three models integrating features extracted from the electronic health records (EHR) alone, features derived from PTS collected in the first 24hrs after ICU admission (PTS24), and models integrating PTS24 and EHR. Outcomes of interest were survival and neurological outcome at ICU discharge. Combined EHR-PTS24 models had higher discrimination (area under the receiver operating characteristic curve [AUC]) than models which used either EHR or PTS24 alone, for the prediction of survival (AUC 0.85, 0.80 and 0.68 respectively) and neurological outcome (0.87, 0.83 and 0.78). The best ML classifier achieved higher discrimination than the reference logistic regression model (APACHE III) for survival (AUC 0.85 vs 0.70) and neurological outcome prediction (AUC 0.87 vs 0.75). Feature analysis revealed previously unknown factors to be associated with post-CA recovery. Results attest to the effectiveness of ML models for post-CA predictive modeling and suggest that PTS recorded in very early phase after resuscitation encode short-term outcome probabilities.
Neural Network Approximation of Graph Fourier Transforms for Sparse Sampling of Networked Flow Dynamics
Pagani, Alessio, Wei, Zhuangkun, Silva, Ricardo, Guo, Weisi
Infrastructure monitoring is critical for safe operations and sustainability. Water distribution networks (WDNs) are large-scale networked critical systems with complex cascade dynamics which are difficult to predict. Ubiquitous monitoring is expensive and a key challenge is to infer the contaminant dynamics from partial sparse monitoring data. Existing approaches use multi-objective optimisation to find the minimum set of essential monitoring points, but lack performance guarantees and a theoretical framework. Here, we first develop Graph Fourier Transform (GFT) operators to compress networked contamination spreading dynamics to identify the essential principle data collection points with inference performance guarantees. We then build autoencoder (AE) inspired neural networks (NN) to generalize the GFT sampling process and under-sample further from the initial sampling set, allowing a very small set of data points to largely reconstruct the contamination dynamics over real and artificial WDNs. Various sources of the contamination are tested and we obtain high accuracy reconstruction using around 5-10% of the sample set. This general approach of compression and under-sampled recovery via neural networks can be applied to a wide range of networked infrastructures to enable digital twins.