Goto

Collaborating Authors

 Performance Analysis


Machine Learning Interview Questions and Answers

#artificialintelligence

Credo systemz are making it a cakewalk for you by providing a list of most probable Machine learning interview questions. These interview questions and answers are framed by a Machine learning Engineer. This set of Machine learning interview questions and answers is the perfect guide for you to learn all the concepts required to clear a Machine learning interview. To get in-depth knowledge on Machine learning, you can enroll for live Machine learning Certification Training by Credo systemz with 24/7 support and lifetime access. In answering this question, try to show you understand of the broad applications... What is bucketing in machine learning?Converting a (usually continuous) feature into multiple binary... What are the advantages of Naive Bayes?In a Naïve Bayes classifier will converge quicker than discriminative... What is inductive machine learning?The inductive machine learning involves the process of learning... What Are The Three Stages To Build The Model In Machine Learning?(a).


A Data Mining Approach to Flight Arrival Delay Prediction for American Airlines

arXiv.org Machine Learning

In the present scenario of domestic flights in USA, there have been numerous instances of flight delays and cancellations. In the United States, the American Airlines, Inc. have been one of the most entrusted and the world's largest airline in terms of number of destinations served. But when it comes to domestic flights, AA has not lived up to the expectations in terms of punctuality or on-time performance. Flight Delays also result in airline companies operating commercial flights to incur huge losses. So, they are trying their best to prevent or avoid Flight Delays and Cancellations by taking certain measures. This study aims at analyzing flight information of US domestic flights operated by American Airlines, covering top 5 busiest airports of US and predicting possible arrival delay of the flight using Data Mining and Machine Learning Approaches. The Gradient Boosting Classifier Model is deployed by training and hyper-parameter tuning it, achieving a maximum accuracy of 85.73%. Such an Intelligent System is very essential in foretelling flights'on-time performance.


GEE: A Gradient-based Explainable Variational Autoencoder for Network Anomaly Detection

arXiv.org Machine Learning

This paper looks into the problem of detecting network anomalies by analyzing NetFlow records. While many previous works have used statistical models and machine learning techniques in a supervised way, such solutions have the limitations that they require large amount of labeled data for training and are unlikely to detect zero-day attacks. Existing anomaly detection solutions also do not provide an easy way to explain or identify attacks in the anomalous traffic. To address these limitations, we develop and present GEE, a framework for detecting and explaining anomalies in network traffic. GEE comprises of two components: (i) Variational Autoencoder (VAE) - an unsupervised deep-learning technique for detecting anomalies, and (ii) a gradient-based fingerprinting technique for explaining anomalies. Evaluation of GEE on the recent UGR dataset demonstrates that our approach is effective in detecting different anomalies as well as identifying fingerprints that are good representations of these various attacks.



AI Takes Aim at Lung Cancer Screening

#artificialintelligence

WEDNESDAY, March 13, 2019 (HealthDay News) -- The term artificial intelligence (AI) might bring to mind robots or self-driving cars. But one group of researchers is using a type of AI to improve lung cancer screening. Screening is important for early diagnosis and improved survival odds, but the current lung cancer screening method has a 96 percent false positive rate. But in the new study, investigators were able to reduce false findings of lung cancer without missing any actual cases. A low-dose CT scan is the standard diagnostic test for people at high risk of lung cancer.


Artificial intelligence cuts lung cancer screening false positives

#artificialintelligence

PITTSBURGH, March 12, 2019 - Lung cancer is the leading cause of cancer deaths worldwide. Screening is key for early detection and increased survival, but the current method has a 96 percent false positive rate. Using machine learning, researchers at the University of Pittsburgh and UPMC Hillman Cancer Center have found a way to substantially reduce false positives without missing a single case of cancer. The study was published today in the journal Thorax. This is the first time artificial intelligence has been applied to the question of sorting out benign from cancerous nodules in lung cancer screening.


ALOHA: Auxiliary Loss Optimization for Hypothesis Augmentation

arXiv.org Machine Learning

Malware detection is a popular application of Machine Learning for Information Security (ML-Sec), in which an ML classifier is trained to predict whether a given file is malware or benignware. Parameters of this classifier are typically optimized such that outputs from the model over a set of input samples most closely match the samples' true malicious/benign (1/0) target labels. However, there are often a number of other sources of contextual metadata for each malware sample, beyond an aggregate malicious/benign label, including multiple labeling sources and malware type information (e.g., ransomware, trojan, etc.), which we can feed to the classifier as auxiliary prediction targets. In this work, we fit deep neural networks to multiple additional targets derived from metadata in a threat intelligence feed for Portable Executable (PE) malware and benignware, including a multi-source malicious/benign loss, a count loss on multi-source detections, and a semantic malware attribute tag loss. We find that incorporating multiple auxiliary loss terms yields a marked improvement in performance on the main detection task. We also demonstrate that these gains likely stem from a more informed neural network representation and are not due to a regularization artifact of multi-target learning. Our auxiliary loss architecture yields a significant reduction in detection error rate (false negatives) of 42.6% at a false positive rate (FPR) of $10^{-3}$ when compared to a similar model with only one target, and a decrease of 53.8% at $10^{-5}$ FPR.


Adversarial attacks against Fact Extraction and VERification

arXiv.org Artificial Intelligence

This paper describes a baseline for the second iteration of the Fact Extraction and VERification shared task (FEVER2.0) which explores the resilience of systems through adversarial evaluation. We present a collection of simple adversarial attacks against systems that participated in the first FEVER shared task. FEVER modeled the assessment of truthfulness of written claims as a joint information retrieval and natural language inference task using evidence from Wikipedia. A large number of participants made use of deep neural networks in their submissions to the shared task. The extent as to whether such models understand language has been the subject of a number of recent investigations and discussion in literature. In this paper, we present a simple method of generating entailment-preserving and entailment-altering perturbations of instances by common patterns within the training data. We find that a number of systems are greatly affected with absolute losses in classification accuracy of up to $29\%$ on the newly perturbed instances. Using these newly generated instances, we construct a sample submission for the FEVER2.0 shared task. Addressing these types of attacks will aid in building more robust fact-checking models, as well as suggest directions to expand the datasets.


Predicting class-imbalanced business risk using resampling, regularization, and model ensembling algorithms

arXiv.org Machine Learning

We aim at developing and improving the imbalanced business risk modeling via jointly using proper evaluation criteria, resampling, cross-validation, classifier regularization, and ensembling techniques. Area Under the Receiver Operating Characteristic Curve (AUC of ROC) is used for model comparison based on 10-fold cross validation. Two undersampling strategies including random undersampling (RUS) and cluster centroid undersampling (CCUS), as well as two oversampling methods including random oversampling (ROS) and Synthetic Minority Oversampling Technique (SMOTE), are applied. Three highly interpretable classifiers, including logistic regression without regularization (LR), L1-regularized LR (L1LR), and decision tree (DT) are implemented. Two ensembling techniques, including Bagging and Boosting, are applied on the DT classifier for further model improvement. The results show that, Boosting on DT by using the oversampled data containing 50% positives via SMOTE is the optimal model and it can achieve AUC, recall, and F1 score valued 0.8633, 0.9260, and 0.8907, respectively.


Learning Data Science through Fun Demonstrations! - Blogs by Nidhi

#artificialintelligence

As a part of a '1 day in Python' workshop, the capabilities of this versatile language were showcased with cases and demonstrations. We realized the underlying logic of the various data science algorithms through these demonstrations; or, to put it in other words – We got an insight into how computers think! Natural Language Processing (NLP) is concerned with programming computers to process and analyze large amounts of natural language data. These find implementations in: Search engines, Social website feeds, Speech engines and Spam filters. We were given a mixture of words.