AITopics

1707.07821

Country: North America > United States > Florida > Alachua County > Gainesville (0.54)

Genre: Research Report > Experimental Study (0.46)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

arXiv.org Machine LearningNov-18-2017

NeuralFDR: Learning Discovery Thresholds from Hypothesis Features

Xia, Fei, Zhang, Martin J., Zou, James, Tse, David

As datasets grow richer, an important challenge is to leverage the full features in the data to maximize the number of useful discoveries while controlling for false positives. We address this problem in the context of multiple hypotheses testing, where for each hypothesis, we observe a p-value along with a set of features specific to that hypothesis. For example, in genetic association studies, each hypothesis tests the correlation between a variant and the trait. We have a rich set of features for each variant (e.g. its location, conservation, epigenetics etc.) which could inform how likely the variant is to have a true association. However popular testing approaches, such as Benjamini-Hochberg's procedure (BH) and independent hypothesis weighting (IHW), either ignore these features or assume that the features are categorical or uni-variate. We propose a new algorithm, NeuralFDR, which automatically learns a discovery threshold as a function of all the hypothesis features. We parametrize the discovery threshold as a neural network, which enables flexible handling of multi-dimensional discrete and continuous features as well as efficient end-to-end optimization. We prove that NeuralFDR has strong false discovery rate (FDR) guarantees, and show that it makes substantially more discoveries in synthetic and real datasets. Moreover, we demonstrate that the learned discovery threshold is directly interpretable.

artificial intelligence, hypothesis, machine learning, (18 more...)

1711.01312

Genre: Research Report > Experimental Study (0.88)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Katehara, Medha, Beauxis-Aussalet, Emma, Alsallakh, Bilal

Prediction Scores as a Window into Classifier Behavior

arXiv.org Machine LearningNov-17-2017

Most multi-class classifiers make their prediction for a test sample by scoring the classes and selecting the one with the highest score. Analyzing these prediction scores is useful to understand the classifier behavior and to assess its reliability. We present an interactive visualization that facilitates per-class analysis of these scores. Our system, called Classilist, enables relating these scores to the classification correctness and to the underlying samples and their features. We illustrate how such analysis reveals varying behavior of different classifiers.

artificial intelligence, classifier, machine learning, (14 more...)

1711.06795

Country: North America > United States > California (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

@machinelearnbotNov-16-2017, 08:51:30 GMT

Data Analytics for Internal Audit Data Mining Blog - www.dataminingblog.com

This is a guest post from Marcel Baumgartner, Data Analytics Expert at Nestlé S.A. Large publicly listed companies not only have external auditors who check the books, but often also a large community of internal auditors. These collaborators provide the company with a sufficient level of assurance in terms of adherence to internal and external rules and guidelines. This covers financial aspects (spend, invoices, investments, …), human resources (working time, payroll, …) but also production related aspects (e.g. One of the strongest trends observed in internal auditing communities is the more and more widespread use of Data Analytics. The term refers to the use of data, statistical methods and statistical thinking as a way of working, in addition to traditional auditing methods like interviews, document and process reviews, etc.

artificial intelligence, data mining, machine learning, (13 more...)

@machinelearnbot

Country:

Europe > Switzerland (0.16)
North America > United States > Indiana > Tippecanoe County (0.15)

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.35)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.30)

International Business TimesNov-15-2017, 16:41:28 GMT

WWE Survivor Series 2017: Predictions, Match Card For Raw vs. SmackDown PPV

WWE Survivor Series 2017 has quickly turned into a "must-see" pay-per-view. Below are predictions for every match on the WWE Survivor Series card. This could end up being a fun match, but make no mistake, Lesnar is going to win at Survivor Series. The man that defeated Braun Strowman clean less than two months ago isn't going to lose to the much smaller Styles. Don't be surprised if the blue brand's top champion doesn't get much offense in at all before being pinned.

artificial intelligence, machine learning, prediction, (13 more...)

International Business Times

Country: North America > United States (0.16)

Industry: Leisure & Entertainment > Sports > Martial Arts (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.43)

Burkart, Samuel, Király, Franz J

Predictive Independence Testing, Predictive Conditional Independence Testing, and Predictive Graphical Modelling

Testing (conditional) independence of multivariate random variables is a task central to statistical inference and modelling in general - though unfortunately one for which to date there does not exist a practicable workflow. State-of-art workflows suffer from the need for heuristic or subjective manual choices, high computational complexity, or strong parametric assumptions. We address these problems by establishing a theoretical link between multivariate/conditional independence testing, and model comparison in the multivariate predictive modelling aka supervised learning task. This link allows advances in the extensively studied supervised learning workflow to be directly transferred to independence testing workflows - including automated tuning of machine learning type which addresses the need for a heuristic choice, the ability to quantitatively trade-off computational demand with accuracy, and the modern black-box philosophy for checking and interfacing. As a practical implementation of this link between the two workflows, we present a python package 'pcit', which implements our novel multivariate and conditional independence tests, interfacing the supervised learning API of the scikit-learn package. Theory and package also allow for straightforward independence test based learning of graphical model structure. We empirically show that our proposed predictive independence test outperform or are on par to current practice, and the derived graphical model structure learning algorithms asymptotically recover the 'true' graph. This paper, and the 'pcit' package accompanying it, thus provide powerful, scalable, generalizable, and easy-to-use methods for multivariate and conditional independence testing, as well as for graphical model structure learning.

artificial intelligence, independence testing, machine learning, (17 more...)

1711.05869

Country: Europe > United Kingdom (0.27)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)

Industry: Banking & Finance > Economy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Obuchi, Tomoyuki, Kabashima, Yoshiyuki

Accelerating Cross-Validation in Multinomial Logistic Regression with $\ell_1$-Regularization

We develop an approximate formula for evaluating a cross-validation estimator of predictive likelihood for multinomial logistic regression regularized by an $\ell_1$-norm. This allows us to avoid repeated optimizations required for literally conducting cross-validation; hence, the computational time can be significantly reduced. The formula is derived through a perturbative approach employing the largeness of the data size and the model dimensionality. Its usefulness is demonstrated on simulated data and the ISOLET dataset from the UCI machine learning repository.

approximation, artificial intelligence, machine learning, (15 more...)

1711.0542

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre:

Research Report > New Finding (0.72)
Research Report > Experimental Study (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Introducing DeepBalance: Random Deep Belief Network Ensembles to Address Class Imbalance

Xenopoulos, Peter

When solving practical classification problems, a practitioner may be faced with class imbalance, meaning that one class has a significantly higher prevalence than the others (also called the majority class). Examples of imbalanced classification problems in the literature include [1], [2], [3], [4]. Class imbalance problems may be exacerbated in the future as we discover new methods to collect rare data and rate of data collection increases. In many class imbalance problems, the minority class is not only the interest, but also carries the higher misclassification cost, which complicates learning [5]. Machine learning classifiers try to find an optimal decision boundary that fits training data. As classifiers generally seek to find the simplest rule that partitions the training data, the simplest rule in imbalanced settings is often always predicting the majority class [6]. Results can be deceptive for such classifiers, as they may achieve high accuracy. For example, in a problem where a minority class occurs 0.1% of the time, an uninformed classifier can achieve 99.9% accuracy by simply always predicting observations as the majority. Thus, the naturally occurring target class distribution is not optimal for learning in highly imbalanced scenarios [7], [8], [9], [10].

artificial intelligence, deepbalance, machine learning, (16 more...)

1709.10056

Country: North America > United States > California (0.28)

Genre: Research Report (0.82)

Industry: Banking & Finance (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.43)

Wald-Kernel: Learning to Aggregate Information for Sequential Inference

Teng, Diyan, Ertin, Emre

Sequential hypothesis testing is a desirable decision making strategy in any time sensitive scenario. Compared with fixed sample-size testing, sequential testing is capable of achieving identical probability of error requirements using less samples in average. For a binary detection problem, it is well known that for known density functions accumulating the likelihood ratio statistics is time optimal under a fixed error rate constraint. This paper considers the problem of learning a binary sequential detector from training samples when density functions are unavailable. We formulate the problem as a constrained likelihood ratio estimation which can be solved efficiently through convex optimization by imposing Reproducing Kernel Hilbert Space (RKHS) structure on the log-likelihood ratio function. In addition, we provide a computationally efficient approximated solution for large scale data set. The proposed algorithm, namely Wald-Kernel, is tested on a synthetic data set and two real world data sets, together with previous approaches for likelihood ratio estimation. Our empirical results show that the classifier trained through the proposed technique achieves smaller average sampling cost than previous approaches proposed in the literature for the same error rate.

classifier, machine learning, reinforcement learning, (20 more...)

1508.07964

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
(2 more...)

Ahmed, Sajid, Rayhan, Farshid, Mahbub, Asif, Jani, Md. Rafsan, Shatabda, Swakkhar, Farid, Dewan Md., Rahman, Chowdhury Mofizur

LIUBoost : Locality Informed Underboosting for Imbalanced Data Classification

arXiv.org Machine LearningNov-14-2017

The problem of class imbalance along with class-overlapping has become a major issue in the domain of supervised learning. Most supervised learning algorithms assume equal cardinality of the classes under consideration while optimizing the cost function and this assumption does not hold true for imbalanced datasets which results in sub-optimal classification. Therefore, various approaches, such as undersampling, oversampling, cost-sensitive learning and ensemble based methods have been proposed for dealing with imbalanced datasets. However, undersampling suffers from information loss, oversampling suffers from increased runtime and potential overfitting while cost-sensitive methods suffer due to inadequately defined cost assignment schemes. In this paper, we propose a novel boosting based method called LIUBoost. LIUBoost uses under sampling for balancing the datasets in every boosting iteration like RUSBoost while incorporating a cost term for every instance based on their hardness into the weight update formula minimizing the information loss introduced by undersampling. LIUBoost has been extensively evaluated on 18 imbalanced datasets and the results indicate significant improvement over existing best performing method RUSBoost.

class imbalance, dataset, liuboost, (12 more...)

1711.05365

Country: Asia > Bangladesh (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.97)