Accuracy
Auditing Algorithms for Bias
In 1971, philosopher John Rawls proposed a thought experiment to understand the idea of fairness: the veil of ignorance. What if, he asked, we could erase our brains so we had no memory of who we were -- our race, our income level, our profession, anything that may influence our opinion? Who would we protect, and who would we serve with our policies? The veil of ignorance is a philosophical exercise for thinking about justice and society. But it can be applied to the burgeoning field of artificial intelligence (AI) as well. Can AI provide the veil of ignorance that would lead us to objective and ideal outcomes?
Precision and Recall for Time Series
Tatbul, Nesime, Lee, Tae Jun, Zdonik, Stan, Alam, Mejbah, Gottschlich, Justin
Classical anomaly detection is principally concerned with point-based anomalies, those anomalies that occur at a single point in time. Yet, many real-world anomalies are range-based, meaning they occur over a period of time. Motivated by this observation, we present a new mathematical model to evaluate the accuracy of time series classification algorithms. Our model expands the well-known Precision and Recall metrics to measure ranges, while simultaneously enabling customization support for domain-specific preferences.
Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples
Tao, Guanhong, Ma, Shiqing, Liu, Yingqi, Zhang, Xiangyu
Adversarial sample attacks perturb benign inputs to induce DNN misbehaviors. Recent research has demonstrated the widespread presence and the devastating consequences of such attacks. Existing defense techniques either assume prior knowledge of specific attacks or may not work well on complex models due to their underlying assumptions. We argue that adversarial sample attacks are deeply entangled with interpretability of DNN models: while classification results on benign inputs can be reasoned based on the human perceptible features/attributes, results on adversarial samples can hardly be explained. Therefore, we propose a novel adversarial sample detection technique for face recognition models, based on interpretability. It features a novel bi-directional correspondence inference between attributes and internal neurons to identify neurons critical for individual attributes. The activation values of critical neurons are enhanced to amplify the reasoning part of the computation and the values of other neurons are weakened to suppress the uninterpretable part. The classification results after such transformation are compared with those of the original model to detect adversaries. Results show that our technique can achieve 94% detection accuracy for 7 different kinds of attacks with 9.91% false positives on benign inputs. In contrast, a state-of-the-art feature squeezing technique can only achieve 55% accuracy with 23.3% false positives.
Real-time Context-aware Learning System for IoT Applications
We propose a real-time context-aware learning system along with the architecture that runs on the mobile devices, provide services to the user and manage the IoT devices. In this system, an application running on mobile devices collected data from the sensors, learned about the user-defined context, made predictions in real-time and manage IoT devices accordingly. However, the computational power of the mobile devices makes it challenging to run machine learning algorithms with acceptable accuracy. To solve this issue, some authors have run machine learning algorithms on the server and transmitted the results to the mobile devices. Although the context-aware predictions made by the server are more accurate than their mobile counterpart, it heavily depends on the network connection for the delivery of the results to the devices, which negatively affects real-time context-learning. Therefore, in this work, we describe a context-learning algorithm for mobile devices which is less demanding on the computational resources and maintains the accuracy of the prediction by updating itself from the learning parameters obtained from the server periodically. Corresponding author Email address: das.bhaskar.1981@gmail.com, Keywords: Mobile computing, Context-aware Applications, Real-time System, Context learning, Cloud Computing 1. Introduction The Internet of Things (IoT) is a vision where everyday devices embedded with computing technology can communicate with one another via the Internet. The IoT devices will play an important role in improving quality of life in many domains such as smart living, transportation, education, agriculture, industry, and the like. By 2020, approximately 212 billion devices will be deployed globally [1] and will consume 45% of the internet traffic by 2022 [2]. IoT-related healthcare industry is expected to grow to $1.1 - $2.5 trillion by 2025 globally [3], and according to "Navigant Research", the building automation systems market is expected to reach $102.0 billion in 2025. Due to the technological advancement of the mobile devices, a new application domain has emerged, called context-aware computing, in which the system can make use of environmental information from collected sensor data and respond accordingly without requiring any user intervention [5].
Superensemble Classifier for Improving Predictions in Imbalanced Datasets
Chakraborty, Tanujit, Chakraborty, Ashis Kumar
Learning from an imbalanced dataset is a tricky proposition. Because these datasets are biased towards one class, most existing classifiers tend not to perform well on minority class examples. Conventional classifiers usually aim to optimize the overall accuracy without considering the relative distribution of each class. This article presents a superensemble classifier, to tackle and improve predictions in imbalanced classification problems, that maps Hellinger distance decision trees (HDDT) into radial basis function network (RBFN) framework. Regularity conditions for universal consistency and the idea of parameter optimization of the proposed model are provided. The proposed distribution-free model can be applied for feature selection cum imbalanced classification problems. We have also provided enough numerical evidence using various real-life data sets to assess the performance of the proposed model. Its effectiveness and competitiveness with respect to different state-of-the-art models are shown.
Efficient learning of neighbor representations for boundary trees and forests
Adikari, Tharindu, Draper, Stark C.
We introduce a semiparametric approach to neighbor-based classification. We build off the recently proposed Boundary Trees algorithm by Mathy et al. (2015) which enables fast neighbor-based classification, regression and retrieval in large datasets. While boundary trees use an Euclidean measure of similarity, the Differentiable Boundary Tree algorithm by Zoran et al. (2017) was introduced to learn low-dimensional representations of complex input data, on which semantic similarity can be calculated to train boundary trees. As is pointed out by its authors, the differentiable boundary tree approach contains a few limitations that prevents it from scaling to large datasets. In this paper, we introduce Differentiable Boundary Sets, an algorithm that overcomes the computational issues of the differentiable boundary tree scheme and also improves its classification accuracy and data representability. Our algorithm is efficiently implementable with existing tools and offers a significant reduction in training time. We test and compare the algorithms on the well known MNIST handwritten digits dataset and the newer Fashion-MNIST dataset by Xiao et al. (2017).
A Preliminary Study on Hyperparameter Configuration for Human Activity Recognition
Garcia, Kemilly Dearo, Carvalho, Tiago, Mendes-Moreira, João, Cardoso, João M. P., de Carvalho, André C. P. L. F.
Human activity recognition (HAR) is a classification task that aims to classify human activities or predict human behavior by means of features extracted from sensors data. Typical HAR systems use wearable sensors and/or handheld and mobile devices with built-in sensing capabilities. Due to the widespread use of smartphones and to the inclusion of various sensors in all contemporary smartphones (e.g., accelerometers and gyroscopes), they are commonly used for extracting and collecting data from sensors and even for implementing HAR systems. When using mobile devices, e.g., smartphones, HAR systems need to deal with several constraints regarding battery, computation and memory. These constraints enforce the need of a system capable of managing its resources and maintain acceptable levels of classification accuracy. Moreover, several factors can influence activity recognition, such as classification models, sensors availability and size of data window for feature extraction, making stable accuracy a difficult task. In this paper, we present a semi-supervised classifier and a study regarding the influence of hyperparameter configuration in classification accuracy, depending on the user and the activities performed by each user. This study focuses on sensing data provided by the PAMAP2 dataset. Experimental results show that it is possible to maintain classification accuracy by adjusting hyperparameters, like window size and windows overlap factor, depending on user and activity performed. These experiments motivate the development of a system able to automatically adapt hyperparameter settings for the activity performed by each user.
Automated Process Incorporating Machine Learning Segmentation and Correlation of Oral Diseases with Systemic Health
Yauney, Gregory, Rana, Aman, Wong, Lawrence C., Javia, Perikumar, Muftu, Ali, Shah, Pratik
Imaging fluorescent disease biomarkers in tissues and skin is a non-invasive method to screen for health conditions. We report an automated process that combines intraoral fluorescent porphyrin biomarker imaging, clinical examinations and machine learning for correlation of systemic health conditions with periodontal disease. 1215 intraoral fluorescent images, from 284 consenting adults aged 18-90, were analyzed using a machine learning classifier that can segment periodontal inflammation. The classifier achieved an AUC of 0.677 with precision and recall of 0.271 and 0.429, respectively, indicating a learned association between disease signatures in collected images. Periodontal diseases were more prevalent among males (p=0.0012) and older subjects (p=0.0224) in the screened population. Physicians independently examined the collected images, assigning localized modified gingival indices (MGIs). MGIs and periodontal disease were then cross-correlated with responses to a medical history questionnaire, blood pressure and body mass index measurements, and optic nerve, tympanic membrane, neurological, and cardiac rhythm imaging examinations. Gingivitis and early periodontal disease were associated with subjects diagnosed with optic nerve abnormalities (p <0.0001) in their retinal scans. We also report significant co-occurrences of periodontal disease in subjects reporting swollen joints (p=0.0422) and a family history of eye disease (p=0.0337). These results indicate cross-correlation of poor periodontal health with systemic health outcomes and stress the importance of oral health screenings at the primary care level. Our screening process and analysis method, using images and machine learning, can be generalized for automated diagnoses and systemic health screenings for other diseases.
Machine Learning Algorithms for Classification of Microcirculation Images from Septic and Non-Septic Patients
Javia, Perikumar, Rana, Aman, Shapiro, Nathan, Shah, Pratik
Sepsis is a life-threatening disease and one of the major causes of death in hospitals. Imaging of microcirculatory dysfunction is a promising approach for automated diagnosis of sepsis. We report a machine learning classifier capable of distinguishing non-septic and septic images from dark field microcirculation videos of patients. The classifier achieves an accuracy of 89.45%. The area under the receiver operating characteristics of the classifier was 0.92, the precision was 0.92 and the recall was 0.84. Codes representing the learned feature space of trained classifier were visualized using t-SNE embedding and were separable and distinguished between images from critically ill and non-septic patients. Using an unsupervised convolutional autoencoder, independent of the clinical diagnosis, we also report clustering of learned features from a compressed representation associated with healthy images and those with microcirculatory dysfunction. The feature space used by our trained classifier to distinguish between images from septic and non-septic patients has potential diagnostic application.
A Text Classification Application: Poet Detection from Poetry
Sahin, Durmus Ozkan, Kural, Oguz Emre, Kilic, Erdal, Karabina, Armagan
With the widespread use of the internet, the size of the text data increases day by day. Poems can be given as an example of the growing text. In this study, we aim to classify poetry according to poet. Firstly, data set consisting of three different poetry of poets written in English have been constructed. Then, text categorization techniques are implemented on it. Chi-Square technique are used for feature selection. In addition, five different classification algorithms are tried. These algorithms are Sequential minimal optimization, Naive Bayes, C4.5 decision tree, Random Forest and k-nearest neighbors. Although each classifier showed very different results, over the 70% classification success rate was taken by sequential minimal optimization technique.