Goto

Collaborating Authors

 Accuracy


Sentiment Analysis on IMDB Movie Comments and Twitter Data by Machine Learning and Vector Space Techniques

arXiv.org Machine Learning

This study's goal is to create a model of sentiment analysis on a 2000 rows IMDB movie comments and 3200 Twitter data by using machine learning and vector space techniques; positive or negative preliminary information about the text is to provide. In the study, a vector space was created in the KNIME Analytics platform, and a classification study was performed on this vector space by Decision Trees, Na\"ive Bayes and Support Vector Machines classification algorithms. The conclusions obtained were compared in terms of each algorithms. The classification results for IMDB movie comments are obtained as 94,00%, 73,20%, and 85,50% by Decision Tree, Naive Bayes and SVM algorithms. The classification results for Twitter data set are presented as 82,76%, 75,44% and 72,50% by Decision Tree, Naive Bayes SVM algorithms as well. It is seen that the best classification results presented in both data sets are which calculated by SVM algorithm.


Deep Learning Enables Automatic Detection and Segmentation of Brain Metastases on Multi-Sequence MRI

arXiv.org Machine Learning

Detecting and segmenting brain metastases is a tedious and time-consuming task for many radiologists, particularly with the growing use of multi-sequence 3D imaging. This study demonstrates automated detection and segmentation of brain metastases on multi-sequence MRI using a deep learning approach based on a fully convolution neural network (CNN). In this retrospective study, a total of 156 patients with brain metastases from several primary cancers were included. Pre-therapy MR images (1.5T and 3T) included pre- and post-gadolinium T1-weighted 3D fast spin echo, post-gadolinium T1-weighted 3D axial IR-prepped FSPGR, and 3D fluid attenuated inversion recovery. The ground truth was established by manual delineation by two experienced neuroradiologists. CNN training/development was performed using 100 and 5 patients, respectively, with a 2.5D network based on a GoogLeNet architecture. The results were evaluated in 51 patients, equally separated into those with few (1-3), multiple (4-10), and many (>10) lesions. Network performance was evaluated using precision, recall, Dice/F1 score, and ROC-curve statistics. For an optimal probability threshold, detection and segmentation performance was assessed on a per metastasis basis. The area under the ROC-curve (AUC), averaged across all patients, was 0.98. The AUC in the subgroups was 0.99, 0.97, and 0.97 for patients having 1-3, 4-10, and >10 metastases, respectively. Using an average optimal probability threshold determined by the development set, precision, recall, and Dice-score were 0.79, 0.53, and 0.79, respectively. At the same probability threshold, the network showed an average false positive rate of 8.3/patient (no lesion-size limit) and 3.4/patient (10 mm3 lesion size limit). In conclusion, a deep learning approach using multi-sequence MRI can aid in the detection and segmentation of brain metastases.


Galaxy classification: A machine learning analysis of GAMA catalogue data

arXiv.org Machine Learning

We present a machine learning analysis of five labelled galaxy catalogues from the Galaxy And Mass Assembly (GAMA): The SersicCatVIKING and SersicCatUKIDSS catalogues containing morphological features, the GaussFitSimple catalogue containing spectroscopic features, the MagPhys catalogue including physical parameters for galaxies, and the Lambdar catalogue, which contains photometric measurements. Extending work previously presented at the ESANN 2018 conference - in an analysis based on Generalized Relevance Matrix Learning Vector Quantization and Random Forests - we find that neither the data from the individual catalogues nor a combined dataset based on all 5 catalogues fully supports the visual-inspection-based galaxy classification scheme employed to categorise the galaxies. In particular, only one class, the Little Blue Spheroids, is consistently separable from the other classes. To aid further insight into the nature of the employed visual-based classification scheme with respect to physical and morphological features, we present the galaxy parameters that are discriminative for the achieved class distinctions.


Continual Learning in Practice

arXiv.org Machine Learning

This paper describes a reference architecture for self-maintaining systems that can learn continually, as data arrives. In environments where data evolves, we need architectures that manage Machine Learning (ML) models in production, adapt to shifting data distributions, cope with outliers, retrain when necessary, and adapt to new tasks. This represents continual AutoML or Automatically Adaptive Machine Learning. We describe the challenges and proposes a reference architecture.


On the Computation and Applications of Large Dense Partial Correlation Networks

arXiv.org Machine Learning

Gaussian graphical models [27] are a popular approach to describing networks, and are directly related to variable prediction via linear regression [20]. The focus is often on graphical model edges described by partial correlations which are zero, identifying pairs of nodes which are conditionally independent [2]. For example, the graphical LASSO [10] imposes a sparse regularization penalty on the precision matrix estimate, seeking a network which trades off predictive accuracy for sparsity. This provides a network which more interpretable and efficient to use, however it is not clear that sparse solutions actually generalize better to new data than dense solutions do [28]. Meanwhile, a different research direction is based on forming edges via some simple relationship such as affinity or univariate correlation. This limited network is used as a starting point for computing sophisticated dense estimates of relatedness between nodes, providing a deeper analysis of network structure. In such research, sparsity is usually imposed on the simple network, however the subsequent analysis is often based on methods which inherently presume Gaussian statistics and l penalties in some sense.


Building an Employee Churn Model in Python to Develop a Strategic Retention Plan

#artificialintelligence

Employee turn-over (also known as "employee churn") is a costly problem for companies. The true cost of replacing an employee can often be quite large. A study by the Center for American Progress found that companies typically pay about one-fifth of an employee's salary to replace that employee, and the cost can significantly increase if executives or highest-paid employees are to be replaced. In other words, the cost of replacing employees for most employers remains significant. This is due to the amount of time spent to interview and find a replacement, sign-on bonuses, and the loss of productivity for several months while the new employee gets accustomed to the new role. Understanding why and when employees are most likely to leave can lead to actions to improve employee retention as well as possibly planning new hiring in advance.


Machine Learning Interview Questions and Answers

#artificialintelligence

Credo systemz are making it a cakewalk for you by providing a list of most probable Machine learning interview questions. These interview questions and answers are framed by a Machine learning Engineer. This set of Machine learning interview questions and answers is the perfect guide for you to learn all the concepts required to clear a Machine learning interview. To get in-depth knowledge on Machine learning, you can enroll for live Machine learning Certification Training by Credo systemz with 24/7 support and lifetime access. In answering this question, try to show you understand of the broad applications... What is bucketing in machine learning?Converting a (usually continuous) feature into multiple binary... What are the advantages of Naive Bayes?In a Naïve Bayes classifier will converge quicker than discriminative... What is inductive machine learning?The inductive machine learning involves the process of learning... What Are The Three Stages To Build The Model In Machine Learning?(a).


A Data Mining Approach to Flight Arrival Delay Prediction for American Airlines

arXiv.org Machine Learning

In the present scenario of domestic flights in USA, there have been numerous instances of flight delays and cancellations. In the United States, the American Airlines, Inc. have been one of the most entrusted and the world's largest airline in terms of number of destinations served. But when it comes to domestic flights, AA has not lived up to the expectations in terms of punctuality or on-time performance. Flight Delays also result in airline companies operating commercial flights to incur huge losses. So, they are trying their best to prevent or avoid Flight Delays and Cancellations by taking certain measures. This study aims at analyzing flight information of US domestic flights operated by American Airlines, covering top 5 busiest airports of US and predicting possible arrival delay of the flight using Data Mining and Machine Learning Approaches. The Gradient Boosting Classifier Model is deployed by training and hyper-parameter tuning it, achieving a maximum accuracy of 85.73%. Such an Intelligent System is very essential in foretelling flights'on-time performance.


GEE: A Gradient-based Explainable Variational Autoencoder for Network Anomaly Detection

arXiv.org Machine Learning

This paper looks into the problem of detecting network anomalies by analyzing NetFlow records. While many previous works have used statistical models and machine learning techniques in a supervised way, such solutions have the limitations that they require large amount of labeled data for training and are unlikely to detect zero-day attacks. Existing anomaly detection solutions also do not provide an easy way to explain or identify attacks in the anomalous traffic. To address these limitations, we develop and present GEE, a framework for detecting and explaining anomalies in network traffic. GEE comprises of two components: (i) Variational Autoencoder (VAE) - an unsupervised deep-learning technique for detecting anomalies, and (ii) a gradient-based fingerprinting technique for explaining anomalies. Evaluation of GEE on the recent UGR dataset demonstrates that our approach is effective in detecting different anomalies as well as identifying fingerprints that are good representations of these various attacks.