Goto

Collaborating Authors

 Accuracy


As easy as APC: Leveraging self-supervised learning in the context of time series classification with varying levels of sparsity and severe class imbalance

arXiv.org Artificial Intelligence

High levels of sparsity and strong class imbalance are ubiquitous challenges that are often presented simultaneously in real-world time series data. While most methods tackle each problem separately, our proposed approach handles both in conjunction, while imposing fewer assumptions on the data. In this work, we propose leveraging a self-supervised learning method, specifically Autoregressive Predictive Coding (APC), to learn relevant hidden representations of time series data in the context of both missing data and class imbalance. We apply APC using either a GRU or GRU-D encoder on two real-world datasets, and show that applying one-step-ahead prediction with APC improves the classification results in all settings. In fact, by applying GRU-D - APC, we achieve state-of-the-art AUPRC results on the Physionet benchmark.


Subgroup Generalization and Fairness of Graph Neural Networks

arXiv.org Artificial Intelligence

Despite enormous successful applications of graph neural networks (GNNs) recently, theoretical understandings of their generalization ability, especially for node-level tasks where data are not independent and identically-distributed (IID), have been sparse. The theoretical investigation of the generalization performance is beneficial for understanding fundamental issues (such as fairness) of GNN models and designing better learning methods. In this paper, we present a novel PAC-Bayesian analysis for GNNs under a non-IID semi-supervised learning setup. Moreover, we analyze the generalization performances on different subgroups of unlabeled nodes, which allows us to further study an accuracy-(dis)parity-style (un)fairness of GNNs from a theoretical perspective. Under reasonable assumptions, we demonstrate that the distance between a test subgroup and the training set can be a key factor affecting the GNN performance on that subgroup, which calls special attention to the training node selection for fair learning. Experiments across multiple GNN models and datasets support our theoretical results.


ggpairs in R- A Brief Introduction to ggpairs

#artificialintelligence

In this article, we are going to compare pairs and ggpairs functions in R. This will return with Color, Labels, Panels, and by Group in-pairs plot. We will make use of mtcars package here. Let's store some variable into data. The diagonal boxes are column variables and the remaining combination of variables scatter plots.


Bootstrapping the error of Oja's Algorithm

arXiv.org Machine Learning

We consider the problem of quantifying uncertainty for the estimation error of the leading eigenvector from Oja's algorithm for streaming principal component analysis, where the data are generated IID from some unknown distribution. By combining classical tools from the U-statistics literature with recent results on high-dimensional central limit theorems for quadratic forms of random vectors and concentration of matrix products, we establish a $\chi^2$ approximation result for the $\sin^2$ error between the population eigenvector and the output of Oja's algorithm. Since estimating the covariance matrix associated with the approximating distribution requires knowledge of unknown model parameters, we propose a multiplier bootstrap algorithm that may be updated in an online manner. We establish conditions under which the bootstrap distribution is close to the corresponding sampling distribution with high probability, thereby establishing the bootstrap as a consistent inferential method in an appropriate asymptotic regime.


Doing good by fighting fraud: Ethical anti-fraud systems for mobile payments

arXiv.org Artificial Intelligence

App builders commonly use security challenges, a form of step-up authentication, to add security to their apps. However, the ethical implications of this type of architecture has not been studied previously. In this paper, we present a large-scale measurement study of running an existing anti-fraud security challenge, Boxer, in real apps running on mobile devices. We find that although Boxer does work well overall, it is unable to scan effectively on devices that run its machine learning models at less than one frame per second (FPS), blocking users who use inexpensive devices. With the insights from our study, we design Daredevil, anew anti-fraud system for scanning payment cards that work swell across the broad range of performance characteristics and hardware configurations found on modern mobile devices. Daredevil reduces the number of devices that run at less than one FPS by an order of magnitude compared to Boxer, providing a more equitable system for fighting fraud. In total, we collect data from 5,085,444 real devices spread across 496 real apps running production software and interacting with real users.


NLP: Twitter Sentiment Analysis

#artificialintelligence

In this hands-on project, we will train a Naive Bayes classifier to predict sentiment from thousands of Twitter tweets. This project could be practically used by any company with social media presence to automatically predict customer's sentiment (i.e.: whether their customers are happy or not). The process could be done automatically without having humans manually review thousands of tweets and customer reviews. Note: This course works best for learners who are based in the North America region.


How to confuse antimalware neural networks. Adversarial attacks and protection

#artificialintelligence

Nowadays, cybersecurity companies implement a variety of methods to discover new, previously unknown malware files. Machine learning (ML) is a powerful and widely used approach for this task. At Kaspersky we have a number of complex ML models based on different file features, including models for static and dynamic detection, for processing sandbox logs and system events, etc. We implement different machine learning techniques, including deep neural networks, one of the most promising technologies that make it possible to work with large amounts of data, incorporate different types of features, and boast a high accuracy rate. But can we rely entirely on machine learning approaches in the battle with the bad guys? Or could powerful AI itself be vulnerable? In this article we attempt to attack our product anti-malware neural network models and check existing defense methods. An adversarial attack is a method of making small modifications to the objects in such a way that the machine learning model begins to misclassify them.


Practical considerations for Machine Learning Classification - AskSid

#artificialintelligence

There is something very satisfying when you build a machine learning classifier using a toy dataset. We can achieve high accuracy and feel good inside while doing it. But this doesn't really help us or prepare us for real-world datasets and the issues it poses. If you have ever trained a machine learning classification model, you may have come across this issue. People use different words for it. 'Imbalanced dataset', 'Model is Skewed', etc. Let's say we are training a model to detect spam emails.


What's Up With the Twist Ending of em False Positive /em , Ilana Glazer's Pregnancy Horror Movie?

Slate

This article contains spoilers for the entirety of False Positive. In False Positive, the phrase "mommy brain" begins almost as a joke. As Lucy (Ilana Glazer, who also co-wrote the film) and Adrian (Justin Theroux) try--and then finally succeed, thanks to Dr. John Hindle (Pierce Brosnan)--to get pregnant, the words, initially used to refer to a strong rush of maternal instinct, take on an increasingly sinister timbre. Is it really just "mommy brain" that's causing Lucy to become suspicious of Hindle, or is the term being used to gaslight her? The final answer seems to be: a little bit of both.


Prediction of Hereditary Cancers Using Neural Networks

arXiv.org Machine Learning

Family history is a major risk factor for many types of cancer. Mendelian risk prediction models translate family histories into cancer risk predictions based on knowledge of cancer susceptibility genes. These models are widely used in clinical practice to help identify high-risk individuals. Mendelian models leverage the entire family history, but they rely on many assumptions about cancer susceptibility genes that are either unrealistic or challenging to validate due to low mutation prevalence. Training more flexible models, such as neural networks, on large databases of pedigrees can potentially lead to accuracy gains. In this paper, we develop a framework to apply neural networks to family history data and investigate their ability to learn inherited susceptibility to cancer. While there is an extensive literature on neural networks and their state-of-the-art performance in many tasks, there is little work applying them to family history data. We propose adaptations of fully-connected neural networks and convolutional neural networks to pedigrees. In data simulated under Mendelian inheritance, we demonstrate that our proposed neural network models are able to achieve nearly optimal prediction performance. Moreover, when the observed family history includes misreported cancer diagnoses, neural networks are able to outperform the Mendelian BRCAPRO model embedding the correct inheritance laws. Using a large dataset of over 200,000 family histories, the Risk Service cohort, we train prediction models for future risk of breast cancer. We validate the models using data from the Cancer Genetics Network.