Accuracy
Hybrid Machine Learning Model of Extreme Learning Machine Radial basis function for Breast Cancer Detection and Diagnosis; a Multilayer Fuzzy Expert System
Mojrian, Sanaz, Pinter, Gergo, Joloudari, Javad Hassannataj, Felde, Imre, Nabipour, Narjes, Nadai, Laszlo, Mosavi, Amir
-- Mammography is often used as the most common laboratory method for the detection of breast cancer, yet associated with the high cost and many side effects. M achine learning prediction as an alternative method has shown promising results. This paper present s a method based on a mul tilayer fuzzy expert system for the detection of breast cancer using an e xtreme learning machine (ELM) classification model integrated with radial basis function (RBF) kernel called ELM - RBF, considering the Wisconsin dataset . The performance of the propose d model is further compared with a l inear - SVM model. Furthermore, both models are studied in terms of criteria of accuracy, precision, sensitivity, specificity, validation, true positive rate (TPR), and false - negative rate (FNR). The ELM - RBF model for these criteria presents better performance compared to the SVM model . Breast cancer is among the most common disease of young women over the world [1 - 3]. Approximately 29.9% of mortality from can cer in women is due to breast cancer. The incidence of this disease is lower in developing countries than in developed countries, about 10% of women with breast cancer in Western countries.
Deep Learning Emulation of Multi-Angle Implementation of Atmospheric Correction (MAIAC)
Duffy, Kate, Vandal, Thomas, Wang, Weile, Nemani, Ramakrishna, Ganguly, Auroop R.
New generation geostationary satellites make solar reflectance observations available at a continental scale with unprecedented spatiotemporal resolution and spectral range. Generating quality land monitoring products requires correction of the effects of atmospheric scattering and absorption, which vary in time and space according to geometry and atmospheric composition. Many atmospheric radiative transfer models, including that of Multi-Angle Implementation of Atmospheric Correction (MAIAC), are too computationally complex to be run in real time, and rely on precomputed look-up tables. Additionally, uncertainty in measurements and models for remote sensing receives insufficient attention, in part due to the difficulty of obtaining sufficient ground measurements. In this paper, we present an adaptation of Bayesian Deep Learning (BDL) to emulation of the MAIAC atmospheric correction algorithm. Emulation approaches learn a statistical model as an efficient approximation of a physical model, while machine learning methods have demonstrated performance in extracting spatial features and learning complex, nonlinear mappings. We demonstrate stable surface reflectance retrieval by emulation (R2 between MAIAC and emulator SR are 0.63, 0.75, 0.86, 0.84, 0.95, and 0.91 for Blue, Green, Red, NIR, SWIR1, and SWIR2 bands, respectively), accurate cloud detection (86\%), and well-calibrated, geolocated uncertainty estimates. Our results support BDL-based emulation as an accurate and efficient (up to 6x speedup) method for approximation atmospheric correction, where built-in uncertainty estimates stand to open new opportunities for model assessment and support informed use of SR-derived quantities in multiple domains.
Model enhancement and personalization using weakly supervised learning for multi-modal mobile sensing
Teng, Diyan, Kulkarni, Rashmi, McGloin, Justin
Always-on sensing of mobile device user's contextual information is critical to many intelligent use cases nowadays such as healthcare, drive assistance, voice UI. State-of-the-art approaches for predicting user context have proved the value to leverage multiple sensing modalities for better accuracy. However, those context inference algorithms that run on application processor nowadays tend to drain heavy amount of power, making them not suitable for an always-on implementation. We claim that not every sensing modality is suitable to be activated all the time and it remains challenging to build an inference engine using power friendly sensing modalities. Meanwhile, due to the diverse population, we find it challenging to learn a context inference model that generalizes well, with limited training data, especially when only using always-on low power sensors. In this work, we propose an approach to leverage the opportunistically-on counterparts in device to improve the always-on prediction model, leading to a personalized solution. We model this problem using a weakly supervised learning framework and provide both theoretical and experimental results to validate our design. The proposed framework achieves satisfying result in the IMU based activity recognition application we considered.
Estimating Skin Tone and Effects on Classification Performance in Dermatology Datasets
Kinyanjui, Newton M., Odonga, Timothy, Cintas, Celia, Codella, Noel C. F., Panda, Rameswar, Sattigeri, Prasanna, Varshney, Kush R.
Recent advances in computer vision and deep learning have led to breakthroughs in the development of automated skin image analysis. In particular, skin cancer classification models have achieved performance higher than trained expert dermatologists. However, no attempt has been made to evaluate the consistency in performance of machine learning models across populations with varying skin tones. In this paper, we present an approach to estimate skin tone in benchmark skin disease datasets, and investigate whether model performance is dependent on this measure. Specifically, we use individual typology angle (ITA) to approximate skin tone in dermatology datasets. We look at the distribution of ITA values to better understand skin color representation in two benchmark datasets: 1) the ISIC 2018 Challenge dataset, a collection of dermoscopic images of skin lesions for the detection of skin cancer, and 2) the SD-198 dataset, a collection of clinical images capturing a wide variety of skin diseases. To estimate ITA, we first develop segmentation models to isolate non-diseased areas of skin. We find that the majority of the data in the the two datasets have ITA values between 34.5{\deg} and 48{\deg}, which are associated with lighter skin, and is consistent with under-representation of darker skinned populations in these datasets. We also find no measurable correlation between performance of machine learning model and ITA values, though more comprehensive data is needed for further validation.
20 Popular Machine Learning Metrics. Part 1: Classification & Regression Evaluation Metrics
Choosing the right metric is crucial while evaluating machine learning (ML) models. Various metrics are proposed to evaluate ML models in different applications, and I thought it may be helpful to provide a summary of popular metrics in a here, for better understanding of each metric and the applications they can be used for. In some applications looking at a single metric may not give you the whole picture of the problem you are solving, and you may want to use a subset of the metrics discussed in this post to have a concrete evaluation of your models. Here, I provide a summary of 20 metrics used for evaluating machine learning models. There is no need to mention that there are various other metrics used in some applications (FDR, FOR, hit@k, etc.), which I am skipping here.
AMP0: Species-Specific Prediction of Anti-microbial Peptides using Zero and Few Shot Learning
The evolution of drug-resistant microbial species is one of the major challenges to global health. The development of new antimicrobial treatments such as antimicrobial peptides needs to be accelerated to combat this threat. However, the discovery of novel antimicrobial peptides is hampered by low-throughput biochemical assays. Computational techniques can be used for rapid screening of promising antimicrobial peptide candidates prior to testing in the wet lab. The vast majority of existing antimicrobial peptide predictors are non-targeted in nature, i.e., they can predict whether a given peptide sequence is antimicrobial, but they are unable to predict whether the sequence can target a particular microbial species. In this work, we have developed a targeted antimicrobial peptide activity predictor that can predict whether a peptide is effective against a given microbial species or not. This has been made possible through zero-shot and few-shot machine learning. The proposed predictor called AMP0 takes in the peptide amino acid sequence and any N/C-termini modifications together with the genomic sequence of a target microbial species to generate targeted predictions. It is important to note that the proposed method can generate predictions for species that are not part of its training set. The accuracy of predictions for novel test species can be further improved by providing a few example peptides for that species. Our computational cross-validation results show that the pro-posed scheme is particularly effective for targeted antimicrobial prediction in comparison to existing approaches and can be used for screening potential antimicrobial peptides in a targeted manner especially for cases in which the number of training examples is small. The webserver of the method is available at http://ampzero.pythonanywhere.com.
Detect Toxic Content to Improve Online Conversations
Mediratta, Deepshi, Oswal, Nikhil
Social media is filled with toxic content. The aim of this paper is to build a model that can detect insincere questions. We use the 'Quora Insincere Questions Classification' dataset for our analysis. The dataset is composed of sincere and insincere questions, with the majority of sincere questions. The dataset is processed and analyzed using Python and its libraries such as sklearn, numpy, pandas, keras etc. The dataset is converted to vector form using word embeddings such as GloVe, Wiki-news and TF-IDF. The imbalance in the dataset is handled by resampling techniques. We train and compare various machine learning and deep learning models to come up with the best results. Models discussed include SVM, Naive Bayes, GRU and LSTM.
Predicting Louisiana Public High School Dropout through Imbalanced Learning Techniques
-- This study is motivated by the magnitude of the problem of Louisiana high school dropout and its negative impacts on individual and public wellbeing. Our goal is to predict students who are at risk of high school dropout, by examining Louisiana administrative dataset. Due to the imbalanced nature of the dataset, imbalanced learning techniques including resampling, case weighting, and cost-sensitive learning have been applied to enhance the prediction performance on the rare class. Performance metrics used in this study are F-measure, recall and precision of the rare class. We compare the performance of several machine learning algorithms such as neural networks, decision trees and bagging trees in combination with the imbalanced learning approaches using an administrative dataset of size of 366k from Louisiana Department of Education. Experiments show that application of imbalanced learning methods produces good results on recall but decreases precision, whereas base classifiers without regard of imbalanced data handling gives better precision but poor recall. Overall application of imbalanced learning techniques is beneficial, yet more studies are desired to improve precision. Louisiana has maintained one of the highest school dropout rates in the US for many years. The Public Affairs Research Council of Louisiana (PAR, October 2011) estimates that one in six of every public high school students in the state drops out of school.
Ensemble Quantile Classifier
Both the median-based classifier and the quantile-based classifier are useful for discriminating high-dimensional data with heavy-tailed or skewed inputs. But these methods are restricted as they assign equal weight to each variable in an unregularized way. The ensemble quantile classifier is a more flexible regularized classifier that provides better performance with high-dimensional data, asymmetric data or when there are many irrelevant extraneous inputs. The improved performance is demonstrated by a simulation study as well as an application to text categorization. It is proven that the estimated parameters of the ensemble quantile classifier consistently estimate the minimal population loss under suitable general model assumptions. It is also shown that the ensemble quantile classifier is Bayes optimal under suitable assumptions with asymmetric Laplace distribution inputs.
Learning Fair and Interpretable Representations via Linear Orthogonalization
He, Yuzi, Burghardt, Keith, Lerman, Kristina
To reduce human error and prejudice, many high-stakes decisions have been turned over to machine algorithms. However, recent research suggests that this does not remove discrimination, and can perpetuate harmful stereotypes. While algorithms have been developed to improve fairness, they typically face at least one of three shortcomings: they are not interpretable, they lose significant accuracy compared to unbiased equivalents, or they are not transferable across models. To address these issues, we propose a geometric method that removes correlations between data and any number of protected variables. Further, we can control the strength of debi-asing through an adjustable parameter to address the tradeoff between model accuracy and fairness. The resulting features are interpretable and can be used with many popular models, such as linear regression, random forest and multilayer perceptrons. The resulting predictions are found to be more accurate and fair than several comparable fair AI algorithms across a variety of benchmark datasets. Our work shows that debiasing data is a simple and effective solution toward improving fairness.