Goto

Collaborating Authors

 Performance Analysis


ggpairs in R- A Brief Introduction to ggpairs

#artificialintelligence

In this article, we are going to compare pairs and ggpairs functions in R. This will return with Color, Labels, Panels, and by Group in-pairs plot. We will make use of mtcars package here. Let's store some variable into data. The diagonal boxes are column variables and the remaining combination of variables scatter plots.


Bootstrapping the error of Oja's Algorithm

arXiv.org Machine Learning

We consider the problem of quantifying uncertainty for the estimation error of the leading eigenvector from Oja's algorithm for streaming principal component analysis, where the data are generated IID from some unknown distribution. By combining classical tools from the U-statistics literature with recent results on high-dimensional central limit theorems for quadratic forms of random vectors and concentration of matrix products, we establish a $\chi^2$ approximation result for the $\sin^2$ error between the population eigenvector and the output of Oja's algorithm. Since estimating the covariance matrix associated with the approximating distribution requires knowledge of unknown model parameters, we propose a multiplier bootstrap algorithm that may be updated in an online manner. We establish conditions under which the bootstrap distribution is close to the corresponding sampling distribution with high probability, thereby establishing the bootstrap as a consistent inferential method in an appropriate asymptotic regime.


Doing good by fighting fraud: Ethical anti-fraud systems for mobile payments

arXiv.org Artificial Intelligence

App builders commonly use security challenges, a form of step-up authentication, to add security to their apps. However, the ethical implications of this type of architecture has not been studied previously. In this paper, we present a large-scale measurement study of running an existing anti-fraud security challenge, Boxer, in real apps running on mobile devices. We find that although Boxer does work well overall, it is unable to scan effectively on devices that run its machine learning models at less than one frame per second (FPS), blocking users who use inexpensive devices. With the insights from our study, we design Daredevil, anew anti-fraud system for scanning payment cards that work swell across the broad range of performance characteristics and hardware configurations found on modern mobile devices. Daredevil reduces the number of devices that run at less than one FPS by an order of magnitude compared to Boxer, providing a more equitable system for fighting fraud. In total, we collect data from 5,085,444 real devices spread across 496 real apps running production software and interacting with real users.


NLP: Twitter Sentiment Analysis

#artificialintelligence

In this hands-on project, we will train a Naive Bayes classifier to predict sentiment from thousands of Twitter tweets. This project could be practically used by any company with social media presence to automatically predict customer's sentiment (i.e.: whether their customers are happy or not). The process could be done automatically without having humans manually review thousands of tweets and customer reviews. Note: This course works best for learners who are based in the North America region.


How to confuse antimalware neural networks. Adversarial attacks and protection

#artificialintelligence

Nowadays, cybersecurity companies implement a variety of methods to discover new, previously unknown malware files. Machine learning (ML) is a powerful and widely used approach for this task. At Kaspersky we have a number of complex ML models based on different file features, including models for static and dynamic detection, for processing sandbox logs and system events, etc. We implement different machine learning techniques, including deep neural networks, one of the most promising technologies that make it possible to work with large amounts of data, incorporate different types of features, and boast a high accuracy rate. But can we rely entirely on machine learning approaches in the battle with the bad guys? Or could powerful AI itself be vulnerable? In this article we attempt to attack our product anti-malware neural network models and check existing defense methods. An adversarial attack is a method of making small modifications to the objects in such a way that the machine learning model begins to misclassify them.


Practical considerations for Machine Learning Classification - AskSid

#artificialintelligence

There is something very satisfying when you build a machine learning classifier using a toy dataset. We can achieve high accuracy and feel good inside while doing it. But this doesn't really help us or prepare us for real-world datasets and the issues it poses. If you have ever trained a machine learning classification model, you may have come across this issue. People use different words for it. 'Imbalanced dataset', 'Model is Skewed', etc. Let's say we are training a model to detect spam emails.


What's Up With the Twist Ending of em False Positive /em , Ilana Glazer's Pregnancy Horror Movie?

Slate

This article contains spoilers for the entirety of False Positive. In False Positive, the phrase "mommy brain" begins almost as a joke. As Lucy (Ilana Glazer, who also co-wrote the film) and Adrian (Justin Theroux) try--and then finally succeed, thanks to Dr. John Hindle (Pierce Brosnan)--to get pregnant, the words, initially used to refer to a strong rush of maternal instinct, take on an increasingly sinister timbre. Is it really just "mommy brain" that's causing Lucy to become suspicious of Hindle, or is the term being used to gaslight her? The final answer seems to be: a little bit of both.


Prediction of Hereditary Cancers Using Neural Networks

arXiv.org Machine Learning

Family history is a major risk factor for many types of cancer. Mendelian risk prediction models translate family histories into cancer risk predictions based on knowledge of cancer susceptibility genes. These models are widely used in clinical practice to help identify high-risk individuals. Mendelian models leverage the entire family history, but they rely on many assumptions about cancer susceptibility genes that are either unrealistic or challenging to validate due to low mutation prevalence. Training more flexible models, such as neural networks, on large databases of pedigrees can potentially lead to accuracy gains. In this paper, we develop a framework to apply neural networks to family history data and investigate their ability to learn inherited susceptibility to cancer. While there is an extensive literature on neural networks and their state-of-the-art performance in many tasks, there is little work applying them to family history data. We propose adaptations of fully-connected neural networks and convolutional neural networks to pedigrees. In data simulated under Mendelian inheritance, we demonstrate that our proposed neural network models are able to achieve nearly optimal prediction performance. Moreover, when the observed family history includes misreported cancer diagnoses, neural networks are able to outperform the Mendelian BRCAPRO model embedding the correct inheritance laws. Using a large dataset of over 200,000 family histories, the Risk Service cohort, we train prediction models for future risk of breast cancer. We validate the models using data from the Cancer Genetics Network.


Federated Learning for Intrusion Detection in IoT Security: A Hybrid Ensemble Approach

arXiv.org Artificial Intelligence

Critical role of Internet of Things (IoT) in various domains like smart city, healthcare, supply chain and transportation has made them the target of malicious attacks. Past works in this area focused on centralized Intrusion Detection System (IDS), assuming the existence of a central entity to perform data analysis and identify threats. However, such IDS may not always be feasible, mainly due to spread of data across multiple sources and gathering at central node can be costly. Also, the earlier works primarily focused on improving True Positive Rate (TPR) and ignored the False Positive Rate (FPR), which is also essential to avoid unnecessary downtime of the systems. In this paper, we first present an architecture for IDS based on hybrid ensemble model, named PHEC, which gives improved performance compared to state-of-the-art architectures. We then adapt this model to a federated learning framework that performs local training and aggregates only the model parameters. Next, we propose Noise-Tolerant PHEC in centralized and federated settings to address the label-noise problem. The proposed idea uses classifiers using weighted convex surrogate loss functions. Natural robustness of KNN classifier towards noisy data is also used in the proposed architecture. Experimental results on four benchmark datasets drawn from various security attacks show that our model achieves high TPR while keeping FPR low on noisy and clean data. Further, they also demonstrate that the hybrid ensemble models achieve performance in federated settings close to that of the centralized settings.


Evaluating OCR Output Quality with Character Error Rate (CER) and Word Error Rate (WER)

#artificialintelligence

The usual way of evaluating prediction output is with the accuracy metric, where we indicate a match (1) or a no match (0). However, this does not provide enough granularity to effectively assess OCR performance. We should instead use error rates to determine the extent to which the OCR transcribed text and ground truth text (i.e. A common intuition is to see how many characters were misspelled. While this is correct, the actual error rate calculation is more complex than that.