AITopics | Accuracy

Collaborating Authors

Accuracy

News Overviews Instructional Materials AI-Alerts Classics

NeuralFDR: Learning Discovery Thresholds from Hypothesis Features

Xia, Fei, Zhang, Martin J., Zou, James, Tse, David

arXiv.org Machine LearningNov-18-2017

As datasets grow richer, an important challenge is to leverage the full features in the data to maximize the number of useful discoveries while controlling for false positives. We address this problem in the context of multiple hypotheses testing, where for each hypothesis, we observe a p-value along with a set of features specific to that hypothesis. For example, in genetic association studies, each hypothesis tests the correlation between a variant and the trait. We have a rich set of features for each variant (e.g. its location, conservation, epigenetics etc.) which could inform how likely the variant is to have a true association. However popular testing approaches, such as Benjamini-Hochberg's procedure (BH) and independent hypothesis weighting (IHW), either ignore these features or assume that the features are categorical or uni-variate. We propose a new algorithm, NeuralFDR, which automatically learns a discovery threshold as a function of all the hypothesis features. We parametrize the discovery threshold as a neural network, which enables flexible handling of multi-dimensional discrete and continuous features as well as efficient end-to-end optimization. We prove that NeuralFDR has strong false discovery rate (FDR) guarantees, and show that it makes substantially more discoveries in synthetic and real datasets. Moreover, we demonstrate that the learned discovery threshold is directly interpretable.

artificial intelligence, hypothesis, machine learning, (18 more...)

arXiv.org Machine Learning

1711.01312

Genre: Research Report > Experimental Study (0.88)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Prediction Scores as a Window into Classifier Behavior

Katehara, Medha, Beauxis-Aussalet, Emma, Alsallakh, Bilal

arXiv.org Machine LearningNov-17-2017

Most multi-class classifiers make their prediction for a test sample by scoring the classes and selecting the one with the highest score. Analyzing these prediction scores is useful to understand the classifier behavior and to assess its reliability. We present an interactive visualization that facilitates per-class analysis of these scores. Our system, called Classilist, enables relating these scores to the classification correctness and to the underlying samples and their features. We illustrate how such analysis reveals varying behavior of different classifiers.

artificial intelligence, classifier, machine learning, (14 more...)

arXiv.org Machine Learning

1711.06795

Country: North America > United States > California (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Add feedback

Data Analytics for Internal Audit Data Mining Blog - www.dataminingblog.com

@machinelearnbotNov-16-2017, 08:51:30 GMT

This is a guest post from Marcel Baumgartner, Data Analytics Expert at Nestlé S.A. Large publicly listed companies not only have external auditors who check the books, but often also a large community of internal auditors. These collaborators provide the company with a sufficient level of assurance in terms of adherence to internal and external rules and guidelines. This covers financial aspects (spend, invoices, investments, …), human resources (working time, payroll, …) but also production related aspects (e.g. One of the strongest trends observed in internal auditing communities is the more and more widespread use of Data Analytics. The term refers to the use of data, statistical methods and statistical thinking as a way of working, in addition to traditional auditing methods like interviews, document and process reviews, etc.

artificial intelligence, data mining, machine learning, (13 more...)

@machinelearnbot

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.05)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.05)
Europe > Switzerland > Vaud > Lausanne (0.05)

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.35)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.30)

Add feedback

WWE Survivor Series 2017: Predictions, Match Card For Raw vs. SmackDown PPV

International Business TimesNov-15-2017, 16:41:28 GMT

WWE Survivor Series 2017 has quickly turned into a "must-see" pay-per-view. Below are predictions for every match on the WWE Survivor Series card. This could end up being a fun match, but make no mistake, Lesnar is going to win at Survivor Series. The man that defeated Braun Strowman clean less than two months ago isn't going to lose to the much smaller Styles. Don't be surprised if the blue brand's top champion doesn't get much offense in at all before being pinned.

artificial intelligence, machine learning, prediction, (13 more...)

International Business Times

Country: North America > United States > Nevada > Clark County > Las Vegas (0.06)

Industry: Leisure & Entertainment > Sports > Martial Arts (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.43)

Add feedback

Predictive Independence Testing, Predictive Conditional Independence Testing, and Predictive Graphical Modelling

Burkart, Samuel, Király, Franz J

arXiv.org Machine LearningNov-15-2017

Testing (conditional) independence of multivariate random variables is a task central to statistical inference and modelling in general - though unfortunately one for which to date there does not exist a practicable workflow. State-of-art workflows suffer from the need for heuristic or subjective manual choices, high computational complexity, or strong parametric assumptions. We address these problems by establishing a theoretical link between multivariate/conditional independence testing, and model comparison in the multivariate predictive modelling aka supervised learning task. This link allows advances in the extensively studied supervised learning workflow to be directly transferred to independence testing workflows - including automated tuning of machine learning type which addresses the need for a heuristic choice, the ability to quantitatively trade-off computational demand with accuracy, and the modern black-box philosophy for checking and interfacing. As a practical implementation of this link between the two workflows, we present a python package 'pcit', which implements our novel multivariate and conditional independence tests, interfacing the supervised learning API of the scikit-learn package. Theory and package also allow for straightforward independence test based learning of graphical model structure. We empirically show that our proposed predictive independence test outperform or are on par to current practice, and the derived graphical model structure learning algorithms asymptotically recover the 'true' graph. This paper, and the 'pcit' package accompanying it, thus provide powerful, scalable, generalizable, and easy-to-use methods for multivariate and conditional independence testing, as well as for graphical model structure learning.

artificial intelligence, independence testing, machine learning, (17 more...)

arXiv.org Machine Learning

1711.05869

Country: Europe > United Kingdom (0.27)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)

Industry: Banking & Finance > Economy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Add feedback

Introducing DeepBalance: Random Deep Belief Network Ensembles to Address Class Imbalance

Xenopoulos, Peter

arXiv.org Machine LearningNov-15-2017

When solving practical classification problems, a practitioner may be faced with class imbalance, meaning that one class has a significantly higher prevalence than the others (also called the majority class). Examples of imbalanced classification problems in the literature include [1], [2], [3], [4]. Class imbalance problems may be exacerbated in the future as we discover new methods to collect rare data and rate of data collection increases. In many class imbalance problems, the minority class is not only the interest, but also carries the higher misclassification cost, which complicates learning [5]. Machine learning classifiers try to find an optimal decision boundary that fits training data. As classifiers generally seek to find the simplest rule that partitions the training data, the simplest rule in imbalanced settings is often always predicting the majority class [6]. Results can be deceptive for such classifiers, as they may achieve high accuracy. For example, in a problem where a minority class occurs 0.1% of the time, an uninformed classifier can achieve 99.9% accuracy by simply always predicting observations as the majority. Thus, the naturally occurring target class distribution is not optimal for learning in highly imbalanced scenarios [7], [8], [9], [10].

artificial intelligence, deepbalance, machine learning, (16 more...)

arXiv.org Machine Learning

1709.10056

Country: North America > United States > California (0.28)

Genre: Research Report (0.82)

Industry: Banking & Finance (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.43)

Add feedback

Global Bigdata Conference

#artificialintelligenceNov-14-2017, 02:59:15 GMT

Cybercrime is on the rise, and organizations across a wide variety of industries -- from financial institutions to insurance, health care providers, and large e-retailers -- are rightfully worried. In the first half of 2017 alone, over 2 billion records were compromised. After stealing PII (personally identifiable information) from these hacks, fraudsters can gain access to customer accounts, create synthetic identities, and even craft phony business profiles to commit various forms of fraud. Naturally, companies are frantically looking to beef up their security teams. A large skills gap is causing hiring difficulties in the cybersecurity industry, so much so that the Information Systems Audit and Control Association found that less than one in four candidates who apply for cybersecurity jobs are qualified.

global bigdata conference, science fiction, social media, (2 more...)

#artificialintelligence

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Science Fiction (0.45)
Information Technology > Communications > Social Media (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.36)

Add feedback

LIUBoost : Locality Informed Underboosting for Imbalanced Data Classification

Ahmed, Sajid, Rayhan, Farshid, Mahbub, Asif, Jani, Md. Rafsan, Shatabda, Swakkhar, Farid, Dewan Md., Rahman, Chowdhury Mofizur

arXiv.org Machine LearningNov-14-2017

The problem of class imbalance along with class-overlapping has become a major issue in the domain of supervised learning. Most supervised learning algorithms assume equal cardinality of the classes under consideration while optimizing the cost function and this assumption does not hold true for imbalanced datasets which results in sub-optimal classification. Therefore, various approaches, such as undersampling, oversampling, cost-sensitive learning and ensemble based methods have been proposed for dealing with imbalanced datasets. However, undersampling suffers from information loss, oversampling suffers from increased runtime and potential overfitting while cost-sensitive methods suffer due to inadequately defined cost assignment schemes. In this paper, we propose a novel boosting based method called LIUBoost. LIUBoost uses under sampling for balancing the datasets in every boosting iteration like RUSBoost while incorporating a cost term for every instance based on their hardness into the weight update formula minimizing the information loss introduced by undersampling. LIUBoost has been extensively evaluated on 18 imbalanced datasets and the results indicate significant improvement over existing best performing method RUSBoost.

class imbalance, dataset, liuboost, (12 more...)

arXiv.org Machine Learning

1711.05365

Country: Asia > Bangladesh (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.97)

Add feedback

pyLEMMINGS: Large Margin Multiple Instance Classification and Ranking for Bioinformatics Applications

Asif, Amina, Abbasi, Wajid Arshad, Munir, Farzeen, Ben-Hur, Asa, Minhas, Fayyaz ul Amir Afsar

arXiv.org Machine LearningNov-13-2017

Motivation: A major challenge in the development of machine learning based methods in computational biology is that data may not be accurately labeled due to the time and resources required for experimentally annotating properties of proteins and DNA sequences. Standard supervised learning algorithms assume accurate instance-level labeling of training data. Multiple instance learning is a paradigm for handling such labeling ambiguities. However, the widely used large-margin classification methods for multiple instance learning are heuristic in nature with high computational requirements. In this paper, we present stochastic sub-gradient optimization large margin algorithms for multiple instance classification and ranking, and provide them in a software suite called pyLEMMINGS. Results: We have tested pyLEMMINGS on a number of bioinformatics problems as well as benchmark datasets. pyLEMMINGS has successfully been able to identify functionally important segments of proteins: binding sites in Calmodulin binding proteins, prion forming regions, and amyloid cores. pyLEMMINGS achieves state-of-the-art performance in all these tasks, demonstrating the value of multiple instance learning. Furthermore, our method has shown more than 100-fold improvement in terms of running time as compared to heuristic solutions with improved accuracy over benchmark datasets. Availability and Implementation: pyLEMMINGS python package is available for download at: http://faculty.pieas.edu.pk/fayyaz/software.html#pylemmings.

bioinformatics, machine learning, pylemming, (17 more...)

arXiv.org Machine Learning

1711.04913

Country:

Asia (0.46)
North America > United States (0.28)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Biomedical Informatics (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.67)

Add feedback

Calibrated Boosting-Forest

Wu, Haozhen

arXiv.org Machine LearningNov-13-2017

Excellent ranking power along with well calibrated probability estimates are needed in many classification tasks. In this paper, we introduce a technique, Calibrated Boosting-Forest that captures both. This novel technique is an ensemble of gradient boosting machines that can support both continuous and binary labels. While offering superior ranking power over any individual regression or classification model, Calibrated Boosting-Forest is able to preserve well calibrated posterior probabilities. Along with these benefits, we provide an alternative to the tedious step of tuning gradient boosting machines. We demonstrate that tuning Calibrated Boosting-Forest can be reduced to a simple hyper-parameter selection. We further establish that increasing this hyper-parameter improves the ranking performance under a diminishing return. We examine the effectiveness of Calibrated Boosting-Forest on ligand-based virtual screening where both continuous and binary labels are available and compare the performance of Calibrated Boosting-Forest with logistic regression, gradient boosting machine and deep learning. Calibrated Boosting-Forest achieved an approximately 48% improvement compared to a state-of-art deep learning model. Moreover, it achieved around 95% improvement on probability quality measurement compared to the best individual gradient boosting machine. Calibrated Boosting-Forest offers a benchmark demonstration that in the field of ligand-based virtual screening, deep learning is not the universally dominant machine learning model and good calibrated probabilities can better facilitate virtual screening process.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

1710.05476

Country: North America > United States > Wisconsin (0.14)

Genre: Research Report > Experimental Study (0.34)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback