Performance Analysis
Augmenting expert detection of early coronary artery occlusion from 12 lead electrocardiograms using deep learning
Brisk, Rob, Finlay, Raymond R Bond. Dewar D, McLaughlin, James, Piadlo, Alicja, Leslie, Stephen J, Gossman, David E, Menown, Ian B A, McEneaney, David J
Early diagnosis of acute coronary artery occlusion based on electrocardiogram (ECG) findings is essential for prompt delivery of primary percutaneous coronary intervention. Current ST elevation (STE) criteria are specific but insensitive. Consequently, it is likely that many patients are missing out on potentially life-saving treatment. Experts combining non-specific ECG changes with STE detect ischaemia with higher sensitivity, but at the cost of specificity. We show that a deep learning model can detect ischaemia caused by acute coronary artery occlusion with a better balance of sensitivity and specificity than STE criteria, existing computerised analysers or expert cardiologists.
GANs for Semi-Supervised Opinion Spam Detection
Stanton, Gray, Irissappane, Athirai A.
Online reviews have become a vital source of information in purchasing a service (product). Opinion spammers manipulate reviews, affecting the overall perception of the service. A key challenge in detecting opinion spam is obtaining ground truth. Though there exists a large set of reviews online, only a few of them have been labeled spam or non-spam. In this paper, we propose spamGAN, a generative adversarial network which relies on limited set of labeled data as well as unlabeled data for opinion spam detection. spamGAN improves the state-of-the-art GAN based techniques for text classification. Experiments on TripAdvisor dataset show that spamGAN outperforms existing spam detection techniques when limited labeled data is used. Apart from detecting spam reviews, spamGAN can also generate reviews with reasonable perplexity.
3D human action analysis and recognition through GLAC descriptor on 2D motion and static posture images
Bulbul, Mohammad Farhad, Islam, Saiful, Ali, Hazrat
Farhad Bulbul is with the Department of Mathematics, Jessore University of Science and Technology, Bangladesh (email: farhad@just.edu.bd). Saiful Islam is with the Department of Mathematics, Bangabandhu Sheikh Mujibur Rahman Science & Technology University, Bangladesh. Dr. Hazrat Ali is with the Department of Electrical and Computer Engineering, COMSATS University Islamabad, Abbottabad Campus, Pakistan (email: hazratali@cuiatd.edu.pk). Abstract-- In this paper, we present an approach for identification of actions within depth action videos. First, we process the video to get motion history images (MHIs) and static history images (SHIs) corresponding to an action video based on the use of 3D Motion Trail Model (3DMTM). We then characterize the action video by extracting the Gradient Local Auto-Correlations (GLAC) features from the SHIs and the MHIs. The two sets of features i.e., GLAC features from MHIs and GLAC features from SHIs are concatenated to obtain a representation vector for action. Finally, we perform the classification on all the action samples by using the l2-regularized Collaborative Representation Classifier (l2-CRC) to recognize different human actions in an effective way. We perform evaluation of the proposed method on three action datasets, MSR-Action3D, DHA and UTD-MHAD. Through experimental results, we observe that the proposed method performs superior to other approaches. I. INTRODUCTION Research in human action recognition (HAR) is considered as one of the most interesting domains of computer vision. The action recognition system is being extensively applied in human security system, medical science, social awareness, and entertainment [1], [2], [3], [4].. Indeed, to develop an applicable action recognition system, researchers still need to win against the odds due to diversity in human body sizes, appearances, postures, motions, clothing, camera motions, viewing angles, and illumination. In the early stage, the human action recognition system was developed by researchers based on RGB data [5], [6], [7], [8].
Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening
Wu, Nan, Phang, Jason, Park, Jungkyu, Shen, Yiqiu, Huang, Zhe, Zorin, Masha, Jastrzฤbski, Stanisลaw, Fรฉvry, Thibault, Katsnelson, Joe, Kim, Eric, Wolfson, Stacey, Parikh, Ujas, Gaddam, Sushma, Lin, Leng Leng Young, Ho, Kara, Weinstein, Joshua D., Reig, Beatriu, Gao, Yiming, Toth, Hildegard, Pysarenko, Kristine, Lewin, Alana, Lee, Jiyon, Airola, Krystal, Mema, Eralda, Chung, Stephanie, Hwang, Esther, Samreen, Naziya, Kim, S. Gene, Heacock, Laura, Moy, Linda, Cho, Kyunghyun, Geras, Krzysztof J.
This paper makes several contributions. Among these, only 20-40% yield a diagnosis of cancer (5). The authors declare no conflict of interest. To whom correspondence should be addressed. Work done while visiting NYU. In the reader study, we compared the performance of our best model to that of radiologists and found our model to be as accurate as radiologists both in terms of area under ROC curve (AUC) and area under precision-recall curve (PRAUC). We also found that a hybrid model, taking the average of the probabilities of malignancy predicted by a radiologist and by our neural network, yields more accurate predictions than either of the two separately. This suggests that our network and radiologists learned different aspects of the task and that our model could be effective as a tool providing radiologists a second reader. With this contribution, research groups that are working on improving screening mammography, which may not have access to a large training dataset like ours, will be able to directly use our model in their research or to use our pretrained weights as an initialization to train models with less data. By making our models public, we invite other groups to validate our results and test their robustness to shifts in the data distribution. The dataset includes 229,426 digital screening mammography exams (1,001,093 images) from 141,473 patients. For each breast, we assign two binary labels: from biopsies. We have 5,832 exams with at least one biopsy the absence/presence of malignant findings in a breast, performed within 120 days of the screening mammogram. With Among these, biopsies confirmed malignant findings for 985 left and right breasts, each exam has a total of four binary (8.4%) breasts and benign findings for 5,556 (47.6%) breasts.
Sentiment Analysis on IMDB Movie Comments and Twitter Data by Machine Learning and Vector Space Techniques
Tarฤฑmer, ฤฐlhan, รoban, Adil, Kocaman, Arif Emre
This study's goal is to create a model of sentiment analysis on a 2000 rows IMDB movie comments and 3200 Twitter data by using machine learning and vector space techniques; positive or negative preliminary information about the text is to provide. In the study, a vector space was created in the KNIME Analytics platform, and a classification study was performed on this vector space by Decision Trees, Na\"ive Bayes and Support Vector Machines classification algorithms. The conclusions obtained were compared in terms of each algorithms. The classification results for IMDB movie comments are obtained as 94,00%, 73,20%, and 85,50% by Decision Tree, Naive Bayes and SVM algorithms. The classification results for Twitter data set are presented as 82,76%, 75,44% and 72,50% by Decision Tree, Naive Bayes SVM algorithms as well. It is seen that the best classification results presented in both data sets are which calculated by SVM algorithm.
Deep Learning Enables Automatic Detection and Segmentation of Brain Metastases on Multi-Sequence MRI
Grรธvik, Endre, Yi, Darvin, Iv, Michael, Tong, Elisabeth, Rubin, Daniel L., Zaharchuk, Greg
Detecting and segmenting brain metastases is a tedious and time-consuming task for many radiologists, particularly with the growing use of multi-sequence 3D imaging. This study demonstrates automated detection and segmentation of brain metastases on multi-sequence MRI using a deep learning approach based on a fully convolution neural network (CNN). In this retrospective study, a total of 156 patients with brain metastases from several primary cancers were included. Pre-therapy MR images (1.5T and 3T) included pre- and post-gadolinium T1-weighted 3D fast spin echo, post-gadolinium T1-weighted 3D axial IR-prepped FSPGR, and 3D fluid attenuated inversion recovery. The ground truth was established by manual delineation by two experienced neuroradiologists. CNN training/development was performed using 100 and 5 patients, respectively, with a 2.5D network based on a GoogLeNet architecture. The results were evaluated in 51 patients, equally separated into those with few (1-3), multiple (4-10), and many (>10) lesions. Network performance was evaluated using precision, recall, Dice/F1 score, and ROC-curve statistics. For an optimal probability threshold, detection and segmentation performance was assessed on a per metastasis basis. The area under the ROC-curve (AUC), averaged across all patients, was 0.98. The AUC in the subgroups was 0.99, 0.97, and 0.97 for patients having 1-3, 4-10, and >10 metastases, respectively. Using an average optimal probability threshold determined by the development set, precision, recall, and Dice-score were 0.79, 0.53, and 0.79, respectively. At the same probability threshold, the network showed an average false positive rate of 8.3/patient (no lesion-size limit) and 3.4/patient (10 mm3 lesion size limit). In conclusion, a deep learning approach using multi-sequence MRI can aid in the detection and segmentation of brain metastases.
Galaxy classification: A machine learning analysis of GAMA catalogue data
Nolte, Aleke, Wang, Lingyu, Bilicki, Maciej, Holwerda, Benne, Biehl, Michael
We present a machine learning analysis of five labelled galaxy catalogues from the Galaxy And Mass Assembly (GAMA): The SersicCatVIKING and SersicCatUKIDSS catalogues containing morphological features, the GaussFitSimple catalogue containing spectroscopic features, the MagPhys catalogue including physical parameters for galaxies, and the Lambdar catalogue, which contains photometric measurements. Extending work previously presented at the ESANN 2018 conference - in an analysis based on Generalized Relevance Matrix Learning Vector Quantization and Random Forests - we find that neither the data from the individual catalogues nor a combined dataset based on all 5 catalogues fully supports the visual-inspection-based galaxy classification scheme employed to categorise the galaxies. In particular, only one class, the Little Blue Spheroids, is consistently separable from the other classes. To aid further insight into the nature of the employed visual-based classification scheme with respect to physical and morphological features, we present the galaxy parameters that are discriminative for the achieved class distinctions.
Continual Learning in Practice
Diethe, Tom, Borchert, Tom, Thereska, Eno, Balle, Borja, Lawrence, Neil
This paper describes a reference architecture for self-maintaining systems that can learn continually, as data arrives. In environments where data evolves, we need architectures that manage Machine Learning (ML) models in production, adapt to shifting data distributions, cope with outliers, retrain when necessary, and adapt to new tasks. This represents continual AutoML or Automatically Adaptive Machine Learning. We describe the challenges and proposes a reference architecture.
On the Computation and Applications of Large Dense Partial Correlation Networks
Gaussian graphical models [27] are a popular approach to describing networks, and are directly related to variable prediction via linear regression [20]. The focus is often on graphical model edges described by partial correlations which are zero, identifying pairs of nodes which are conditionally independent [2]. For example, the graphical LASSO [10] imposes a sparse regularization penalty on the precision matrix estimate, seeking a network which trades off predictive accuracy for sparsity. This provides a network which more interpretable and efficient to use, however it is not clear that sparse solutions actually generalize better to new data than dense solutions do [28]. Meanwhile, a different research direction is based on forming edges via some simple relationship such as affinity or univariate correlation. This limited network is used as a starting point for computing sophisticated dense estimates of relatedness between nodes, providing a deeper analysis of network structure. In such research, sparsity is usually imposed on the simple network, however the subsequent analysis is often based on methods which inherently presume Gaussian statistics and l penalties in some sense.
Building an Employee Churn Model in Python to Develop a Strategic Retention Plan
Employee turn-over (also known as "employee churn") is a costly problem for companies. The true cost of replacing an employee can often be quite large. A study by the Center for American Progress found that companies typically pay about one-fifth of an employee's salary to replace that employee, and the cost can significantly increase if executives or highest-paid employees are to be replaced. In other words, the cost of replacing employees for most employers remains significant. This is due to the amount of time spent to interview and find a replacement, sign-on bonuses, and the loss of productivity for several months while the new employee gets accustomed to the new role. Understanding why and when employees are most likely to leave can lead to actions to improve employee retention as well as possibly planning new hiring in advance.