AITopics | Accuracy

Collaborating Authors

Accuracy

News Overviews Instructional Materials AI-Alerts Classics

Confidence Intervals for Algorithmic Leveraging in Linear Regression

arXiv.org Machine LearningMar-10-2018

The age of big data has produced data sets that are computationally expensive to analyze and store. Algorithmic leveraging proposes that we sample observations from the original data set to generate a representative data set and then perform analysis on the representative data set. In this paper, we present efficient algorithms for constructing finite sample confidence intervals for each algorithmic leveraging estimated regression coefficient, with asymptotic coverage guarantees. In simulations, we confirm empirically that the confidence intervals have the desired coverage probabilities, while bootstrap confidence intervals may not.

artificial intelligence, confidence interval, machine learning, (16 more...)

arXiv.org Machine Learning

1606.01473

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Customer Analytics: Using Deep Learning With Keras To Predict Customer Churn

@machinelearnbotMar-9-2018, 23:51:02 GMT

Customer churn is a problem that all companies need to monitor, especially those that depend on subscription-based revenue streams. The simple fact is that most organizations have data that can be used to target these individuals and to understand the key drivers of churn, and we now have Keras for Deep Learning available in R (Yes, in R!!), which predicted customer churn with 82% accuracy. We're super excited for this article because we are using the new keras package to produce an Artificial Neural Network (ANN) model on the IBM Watson Telco Customer Churn Data Set! As for most business problems, it's equally important to explain what features drive the model, which is why we'll use the lime package for explainability. In addition, we use three new packages to assist with Machine Learning (ML): recipes for preprocessing, rsample for sampling data and yardstick for model metrics. These are relatively new additions to CRAN developed by Max Kuhn at RStudio (creator of the caret package). It seems that R is quickly developing ML tools that rival Python. Good news if you're interested in applying Deep Learning in R! We are so let's get going!! Customer churn refers to the situation when a customer ends their relationship with a company, and it's a costly problem. Customers are the fuel that powers a business. Further, it's much more difficult and costly to gain new customers than it is to retain existing customers. As a result, organizations need to focus on reducing customer churn. The good news is that machine learning can help. For many businesses that offer subscription based services, it's critical to both predict customer churn and explain what features relate to customer churn.

artificial intelligence, customer, machine learning, (17 more...)

@machinelearnbot

Genre: Workflow (0.69)

Industry:

Education (0.94)
Telecommunications (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)

Add feedback

City-wide Analysis of Electronic Health Records Reveals Gender and Age Biases in the Administration of Known Drug-Drug Interactions

Correia, Rion Brattig, de Araújo, Luciana P., Mattos, Mauro M., Wild, David, Rocha, Luis M.

arXiv.org Machine LearningMar-9-2018

From a public-health perspective, the occurrence of drug-drug-interactions (DDI) from multiple drug prescriptions is a serious problem, especially in the elderly population. This is true both for individuals and the system itself since patients with complications due to DDI will likely re-enter the system at a costlier level. We conducted an 18-month study of DDI occurrence in Blumenau (Brazil; pop. 340,000) using city-wide drug dispensing data from both primary and secondary-care level. Our goal is also to identify possible risk factors in a large population, ultimately characterizing the burden of DDI for patients, doctors and the public system itself. We found 181 distinct DDI being prescribed concomitantly to almost 5% of the city population. We also discovered that women are at a 60% risk increase of DDI when compared to men, while only having a 6% co-administration risk increase. Analysis of the DDI co-occurrence network reveals which DDI pairs are most associated with the observed greater DDI risk for females, demonstrating that contraception and hormone therapy are not the main culprits of the gender disparity, which is maximized after the reproductive years. Furthermore, DDI risk increases dramatically with age, with patients age 70-79 having a 50-fold risk increase in comparison to patients aged 0-19. Interestingly, several null models demonstrate that this risk increase is not due to increased polypharmacy with age. Finally, we demonstrate that while the number of drugs and co-administrations help predict a patient's number of DDI ($R^2=.413$), they are not sufficient to flag these patients accurately, which we achieve by training classifiers with additional data (MCC=.83,F1=.72). These results demonstrate that accurate warning systems for known DDI can be devised for public and private systems alike, resulting in substantial prevention of DDI-related ADR and savings.

artificial intelligence, interaction, machine learning, (17 more...)

arXiv.org Machine Learning

1803.03571

Country:

South America > Brazil (1.00)
North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
(7 more...)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Explaining Black-box Android Malware Detection

Melis, Marco, Maiorca, Davide, Biggio, Battista, Giacinto, Giorgio, Roli, Fabio

arXiv.org Machine LearningMar-9-2018

Machine-learning models have been recently used for detecting malicious Android applications, reporting impressive performances on benchmark datasets, even when trained only on features statically extracted from the application, such as system calls and permissions. However, recent findings have highlighted the fragility of such in-vitro evaluations with benchmark datasets, showing that very few changes to the content of Android malware may suffice to evade detection. How can we thus trust that a malware detector performing well on benchmark data will continue to do so when deployed in an operating environment? To mitigate this issue, the most popular Android malware detectors use linear, explainable machine-learning models to easily identify the most influential features contributing to each decision. In this work, we generalize this approach to any black-box machine- learning model, by leveraging a gradient-based approach to identify the most influential local features. This enables using nonlinear models to potentially increase accuracy without sacrificing interpretability of decisions. Our approach also highlights the global characteristics learned by the model to discriminate between benign and malware applications. Finally, as shown by our empirical analysis on a popular Android malware detection task, it also helps identifying potential vulnerabilities of linear and nonlinear models against adversarial manipulations.

android, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1803.03544

Country: Europe (0.46)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.47)

Add feedback

Influence of the Event Rate on Discrimination Abilities of Bankruptcy Prediction Models

Zhang, Lili, Priestley, Jennifer, Ni, Xuelei

arXiv.org Machine LearningMar-9-2018

In bankruptcy prediction, the proportion of events is very low, which is often oversampled to eliminate this bias. In this paper, we study the influence of the event rate on discrimination abilities of bankruptcy prediction models. First the statistical association and significance of public records and firmographics indicators with the bankruptcy were explored. Then the event rate was oversampled from 0.12% to 10%, 20%, 30%, 40%, and 50%, respectively. Seven models were developed, including Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, Support Vector Machine, Bayesian Network, and Neural Network. Under different event rates, models were comprehensively evaluated and compared based on Kolmogorov-Smirnov Statistic, accuracy, F1 score, Type I error, Type II error, and ROC curve on the hold-out dataset with their best probability cut-offs. Results show that Bayesian Network is the most insensitive to the event rate, while Support Vector Machine is the most sensitive.

artificial intelligence, machine learning, prediction, (15 more...)

arXiv.org Machine Learning

doi: 10.5121/ijdms.2018.10101

1803.03756

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.90)

Industry: Banking & Finance (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.72)

Add feedback

Joint Estimation and Inference for Data Integration Problems based on Multiple Multi-layered Gaussian Graphical Models

Majumdar, Subhabrata, Michailidis, George

arXiv.org Machine LearningMar-8-2018

Aberrations in complex biological systems develop in the background of diverse genetic and environmental factors and are associated with multiple complex molecular events. These include changes in the genome, transcriptome, proteome and metabolome, as well as epigenetic effects. Advances in high-throughput profiling techniques have enabled a systematic and comprehensive exploration of the genetic and epigenetic basis of various diseases, including cancer (Kaushik et al., 2016; Lee et al., 2016), diabetes (Sas et al., 2018; Yuan et al., 2014), chronic kidney disease (Atzler et al., 2014), etc. Further, such multi-Omics collections have become available for patients belonging to different, but related disease subtypes, with The Cancer Genome Atlas (TCGA: Tomczak et al. (2015)) being a prototypical one. Hence, there is an increasing need for models that can integrate such complex data both vertically across multiple modalities and horizontally across different disease subtypes. Figure 1 provides a schematic representation of the horizontal and vertical structure of such heterogeneous multi-modal Omics data as outlined above.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Machine Learning

1803.03348

Country: North America > United States (0.27)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Integration (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Add feedback

A Bayesian and Machine Learning approach to estimating Influence Model parameters for IM-RO

Lawrence, Trisha

arXiv.org Machine LearningMar-8-2018

The rise of Online Social Networks (OSNs) has caused an insurmountable amount of interest from advertisers and researchers seeking to monopolize on its features. Researchers aim to develop strategies for determining how information is propagated among users within an OSN that is captured by diffusion or influence models. We consider the influence models for the IM-RO problem, a novel formulation to the Influence Maximization (IM) problem based on implementing Stochastic Dynamic Programming (SDP). In contrast to existing approaches involving influence spread and the theory of submodular functions, the SDP method focuses on optimizing clicks and ultimately revenue to advertisers in OSNs. Existing approaches to influence maximization have been actively researched over the past decade, with applications to multiple fields, however, our approach is a more practical variant to the original IM problem. In this paper, we provide an analysis on the influence models of the IM-RO problem by conducting experiments on synthetic and real-world datasets. We propose a Bayesian and Machine Learning approach for estimating the parameters of the influence models for the (Influence Maximization- Revenue Optimization) IM-RO problem. We present a Bayesian hierarchical model and implement the well-known Naive Bayes classifier (NBC), Decision Trees classifier (DTC) and Random Forest classifier (RFC) on three real-world datasets. Compared to previous approaches to estimating influence model parameters, our strategy has the great advantage of being directly implementable in standard software packages such as WinBUGS/OpenBUGS/JAGS and Apache Spark. We demonstrate the efficiency and usability of our methods in terms of spreading information and generating revenue for advertisers in the context of OSNs.

artificial intelligence, bayesian inference, machine learning, (20 more...)

arXiv.org Machine Learning

1803.03191

Country: North America > United States > California (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Services (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Penalizing Unfairness in Binary Classification

Bechavod, Yahav, Ligett, Katrina

arXiv.org Machine LearningMar-8-2018

We present a new approach for mitigating unfairness in learned classifiers. In particular, we focus on binary classification tasks over individuals from two populations, where, as our criterion for fairness, we wish to achieve similar false positive rates in both populations, and similar false negative rates in both populations. As a proof of concept, we implement our approach and empirically evaluate its ability to achieve both fairness and accuracy, using datasets from the fields of criminal risk assessment, credit, lending, and college admissions.

artificial intelligence, classifier, machine learning, (15 more...)

arXiv.org Machine Learning

1707.00044

Country:

North America > United States (1.00)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.84)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.49)
Education > Educational Setting > Higher Education (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

WWE Fastlane 2018: Predictions, Matches For Last 'SmackDown' PPV Before WrestleMania 34

International Business TimesMar-7-2018, 18:08:57 GMT

It isn't likely that any championships will change hands, though the matches in Columbus, Ohio will help advance WrestleMania storylines. Below are predictions for every match on the WWE Fastlane card. From the moment Shinsuke Nakamura won the Royal Rumble, it's been clear that he would face AJ Styles at WrestleMania in a rematch from Wrestle Kingdom 10. That means Styles will find a way to retain the title. Another WrestleMania feud could be born from the championship match, though don't be surprised to see Owens and Zayn team up after having their differences over the past few weeks.

artificial intelligence, machine learning, prediction, (10 more...)

International Business Times

Country: North America > United States > Ohio > Franklin County > Columbus (0.26)

Industry: Leisure & Entertainment > Sports > Martial Arts (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.44)

Add feedback

A bag-to-class divergence approach to multiple-instance learning

Møllersen, Kajsa, Hardeberg, Jon Yngve, Godtliebsen, Fred

arXiv.org Machine LearningMar-7-2018

In multi-instance (MI) learning, each object (bag) consists of multiple feature vectors (instances), and is most commonly regarded as a set of points in a multidimensional space. A different viewpoint is that the instances are realisations of random vectors with corresponding probability distribution, and that a bag is the distribution, not the realisations. In MI classification, each bag in the training set has a class label, but the instances are unlabelled. By introducing the probability distribution space to bag-level classification problems, dissimilarities between probability distributions (divergences) can be applied. The bag-to-bag Kullback-Leibler information is asymptotically the best classifier, but the typical sparseness of MI training sets is an obstacle. We introduce bag-to-class divergence to MI learning, emphasising the hierarchical nature of the random vectors that makes bags from the same class different. We propose two properties for bag-to-class divergences, and an additional property for sparse training sets.

artificial intelligence, divergence, machine learning, (17 more...)

arXiv.org Machine Learning

1803.02782

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback