Goto

Collaborating Authors

 Accuracy


Ecological Data Analysis Based on Machine Learning Algorithms

arXiv.org Machine Learning

Abstract: Classification is an important supervised machine learning method, which is necessary and challenging issue for ecological research. It offers a way to classify a dataset into subsets that share common patterns. Notably, there are many classification algorithms to choose from, each making certain assumptions about the data and about how classification should be formed. In this paper, we applied eight machine learning classification algorithms such as Decision Trees, Random Forest, Artificial Neural Network, Support Vector Machine, Linear Discriminant Analysis, k-nearest neighbors, Logistic Regression and Naive Bayes on ecological data. The goal of this study is to compare different machine learning classification algorithms in ecological dataset. In this analysis we have checked the accuracy test among the algorithms. In our study we conclude that Linear Discriminant Analysis and k-nearest neighbors are the best methods among all other methods.


A Method to Facilitate Cancer Detection and Type Classification from Gene Expression Data using a Deep Autoencoder and Neural Network

arXiv.org Machine Learning

With the increased affordability and availability of whole-genome sequencing, large-scale and high-throughput gene expression is widely used to characterize diseases, including cancers. However, establishing specificity in cancer diagnosis using gene expression data continues to pose challenges due to the high dimensionality and complexity of the data. Here we present models of deep learning (DL) and apply them to gene expression data for the diagnosis and categorization of cancer. In this study, we have developed two DL models using messenger ribonucleic acid (mRNA) datasets available from the Genomic Data Commons repository. Our models achieved 98% accuracy in cancer detection, with false negative and false positive rates below 1.7%. In our results, we demonstrated that 18 out of 32 cancer-typing classifications achieved more than 90% accuracy. Due to the limitation of a small sample size (less than 50 observations), certain cancers could not achieve a higher accuracy in typing classification, but still achieved high accuracy for the cancer detection task. To validate our models, we compared them with traditional statistical models. The main advantage of our models over traditional cancer detection is the ability to use data from various cancer types to automatically form features to enhance the detection and diagnosis of a specific cancer type.


Fast and Accurate 3D Medical Image Segmentation with Data-swapping Method

arXiv.org Machine Learning

Deep neural network models used for medical image segmentation are large because they are trained with high-resolution three-dimensional (3D) images. Graphics processing units (GPUs) are widely used to accelerate the trainings. However, the memory on a GPU is not large enough to train the models. A popular approach to tackling this problem is patch-based method, which divides a large image into small patches and trains the models with these small patches. However, this method would degrade the segmentation quality if a target object spans multiple patches. In this paper, we propose a novel approach for 3D medical image segmentation that utilizes the data-swapping, which swaps out intermediate data from GPU memory to CPU memory to enlarge the effective GPU memory size, for training high-resolution 3D medical images without patching. We carefully tuned parameters in the data-swapping method to obtain the best training performance for 3D U-Net, a widely used deep neural network model for medical image segmentation. We applied our tuning to train 3D U-Net with full-size images of 192 x 192 x 192 voxels in brain tumor dataset. As a result, communication overhead, which is the most important issue, was reduced by 17.1%. Compared with the patch-based method for patches of 128 x 128 x 128 voxels, our training for full-size images achieved improvement on the mean Dice score by 4.48% and 5.32 % for detecting whole tumor sub-region and tumor core sub-region, respectively. The total training time was reduced from 164 hours to 47 hours, resulting in 3.53 times of acceleration.


Sepsis Prediction and Vital Signs Ranking in Intensive Care Unit Patients

arXiv.org Machine Learning

We study multiple rule-based and machine learning (ML) models for sepsis detection. We report the first neural network detection and prediction results on three categories of sepsis. We have used the retrospective Medical Information Mart for Intensive Care (MIMIC)-III dataset, restricted to intensive care unit (ICU) patients. Features for prediction were created from only common vital sign measurements. We show significant improvement of AUC score using neural network based ensemble model compared to single ML and rule-based models. For the detection of sepsis, severe sepsis, and septic shock, our model achieves an AUC of 0.94, 0.91 and 0.89, respectively. Four hours before the onset, it predicts the same three categories with an AUC of 0.80, 0.81 and 0.84 respectively. Further, we ranked the features and found that using six vital signs consistently provides higher detection and prediction AUC for all the models tested. Our novel ensemble model achieves highest AUC in detecting and predicting sepsis, severe sepsis, and septic shock in the MIMIC-III ICU patients, and is amenable to deployment in hospital settings.


LoAdaBoost:Loss-Based AdaBoost Federated Machine Learning on medical Data

arXiv.org Machine Learning

Medical data are valuable for improvement of health care, policy making and many other purposes. Vast amount of medical data are stored in different locations, on many different devices and in different data silos. Sharing medical data among different sources is a big challenge due to regulatory, operational and security reasons. One potential solution is federated machine learning ,which is a method that sends machine learning algorithms simultaneously to all data sources, train models in each source and aggregates the learned models. This strategy allows utilization of valuable data without moving them.One challenge in applying federated machine learning is the heterogeneity of data from different sources. To tackle this problem, we proposed an adaptive boosting method that increases the efficiency of federated machine learning. Using intensive care unit data from hospital, we showed that LoAdaBoost federated learning outperformed baseline method and increased communication efficiency at negligible additional cost.


Iterative annotation to ease neural network training: Specialized machine learning in medical image analysis

arXiv.org Machine Learning

Neural networks promise to bring robust, quantitative analysis to medical fields, but adoption is limited by the technicalities of training these networks. To address this translation gap between medical researchers and neural networks in the field of pathology, we have created an intuitive interface which utilizes the commonly used whole slide image (WSI) viewer, Aperio ImageScope (Leica Biosystems Imaging, Inc.), for the annotation and display of neural network predictions on WSIs. Leveraging this, we propose the use of a human-in-the-loop strategy to reduce the burden of WSI annotation. We track network performance improvements as a function of iteration and quantify the use of this pipeline for the segmentation of renal histologic findings on WSIs. More specifically, we present network performance when applied to segmentation of renal micro compartments, and demonstrate multi-class segmentation in human and mouse renal tissue slides. Finally, to show the adaptability of this technique to other medical imaging fields, we demonstrate its ability to iteratively segment human prostate glands from radiology imaging data. 1 Introduction In the current era of artificial intelligence, robust automated image analysis is attained using supervised machine learning algorithms. This approach is gaining considerable ground in virtually every domain of data analysis, mainly under the advent of neural networks [2-5]. Neural networks are a broad range of algorithms which can take many different forms, but all are considered graphical models, whose nodes can be variably activated by a nonlinear operation on the sum of their inputs [4, 6].


Deep Transfer Learning for Static Malware Classification

arXiv.org Machine Learning

Abstract--We propose to apply deep transfer learning from computer vision to static malware classification. In the transfer learning scheme, we borrow knowledge from natural images or objects and apply to the target domain of static malware detection. As a result, training time of deep neural networks is accelerated while high classification performance is still maintained. We instrument an interpretation component to the algorithm and provide interpretable explanations to enhance security practitioners' trust to the model. We further discuss a convex combination scheme of transfer learning and training from scratch for enhanced malware detection, and provide insights of the algorithmic interpretation of vision-based malware classification techniques. I. INTRODUCTION Malware is a type of software that possesses malicious characteristics to cause damage to the user, computer or network. Categories of malware include virus, trojan horses, worms, spyware, ransomware and so on. Static analysis is a quick and straightforward way to detect malware without executing the application or monitoring the run time behavior. Onemain technique is the so-called signature matching, where the goal is to search whether the strings in the code actually match any identified malicious patterns in database. However when the code is obfuscated or morphed, signature matching cannot be applied and becomes less resilient to detect malicious patterns.


Knowing what you know in brain segmentation using deep neural networks

arXiv.org Machine Learning

In this paper, we describe a deep neural network trained to predict FreeSurfer segmentations of structural MRI volumes, in seconds rather than hours. The network was trained and evaluated on an extremely large dataset (n = 11,148), obtained by combining data from more than a hundred sites. We also show that the prediction uncertainty of the network at each voxel is a good indicator of whether the network has made an error. The resulting uncertainty volume can be used in conjunction with the predicted segmentation to improve downstream uses, such as calculation of measures derived from segmentation regions of interest or the building of prediction models. Finally, we demonstrate that the average prediction uncertainty across voxels in the brain is an excellent indicator of manual quality control ratings, outperforming the best available automated solutions.


Classification using Ensemble Learning under Weighted Misclassification Loss

arXiv.org Machine Learning

Binary classification rules based on covariates typically depend on simple loss functions such as zero-one misclassification. Some cases may require more complex loss functions. For example, individual-level monitoring of HIV-infected individuals on antiretroviral therapy (ART) requires periodic assessment of treatment failure, defined as having a viral load (VL) value above a certain threshold. In some resource limited settings, VL tests may be limited by cost or technology, and diagnoses are based on other clinical markers. Depending on scenario, higher premium may be placed on avoiding false-positives which brings greater cost and reduced treatment options. Here, the optimal rule is determined by minimizing a weighted misclassification loss/risk. We propose a method for finding and cross-validating optimal binary classification rules under weighted misclassification loss. We focus on rules comprising a prediction score and an associated threshold, where the score is derived using an ensemble learner. Simulations and examples show that our method, which derives the score and threshold jointly, more accurately estimates overall risk and has better operating characteristics compared with methods that derive the score first and the cutoff conditionally on the score especially for finite samples.


Integrating Artificial Intelligence with Real-time Intracranial EEG Monitoring to Automate Interictal Identification of Seizure Onset Zones in Focal Epilepsy

arXiv.org Artificial Intelligence

An ability to map seizure-generating brain tissue, i.e., the seizure onset zone (SOZ), without recording actual seizures could reduce the duration of invasive EEG monitoring for patients with drug-resistant epilepsy. A widely-adopted practice in the literature is to compare the incidence (events/time) of putative pathological electrophysiological biomarkers associated with epileptic brain tissue with the SOZ determined from spontaneous seizures recorded with intracranial EEG, primarily using a single biomarker. Clinical translation of the previous efforts suffers from their inability to generalize across multiple patients because of (a) the inter-patient variability and (b) the temporal variability in the epileptogenic activity. Here, we report an artificial intelligence-based approach for combining multiple interictal electrophysiological biomarkers and their temporal characteristics as a way of accounting for the above barriers and show that it can reliably identify seizure onset zones in a study cohort of 82 patients who underwent evaluation for drug-resistant epilepsy. Our investigation provides evidence that utilizing the complementary information provided by multiple electrophysiological biomarkers and their temporal characteristics can significantly improve the localization potential compared to previously published single-biomarker incidence-based approaches, resulting in an average area under ROC curve (AUC) value of 0.73 in a cohort of 82 patients. Our results also suggest that recording durations between ninety minutes and two hours are sufficient to localize SOZs with accuracies that may prove clinically relevant. The successful validation of our approach on a large cohort of 82 patients warrants future investigation on the feasibility of utilizing intra-operative EEG monitoring and artificial intelligence to localize epileptogenic brain tissue.