AITopics | Performance Analysis

Collaborating Authors

Performance Analysis

News Overviews Instructional Materials AI-Alerts Classics

Probabilistic Data Analysis with Probabilistic Programming

arXiv.org Machine LearningAug-18-2016

Probabilistic techniques are central to data analysis, but different approaches can be difficult to apply, combine, and compare. This paper introduces composable generative population models (CGPMs), a computational abstraction that extends directed graphical models and can be used to describe and compose a broad class of probabilistic data analysis techniques. Examples include hierarchical Bayesian models, multivariate kernel methods, discriminative machine learning, clustering algorithms, dimensionality reduction, and arbitrary probabilistic programs. We also demonstrate the integration of CGPMs into BayesDB, a probabilistic programming platform that can express data analysis tasks using a modeling language and a structured query language. The practical value is illustrated in two ways. First, CGPMs are used in an analysis that identifies satellite data records which probably violate Kepler's Third Law, by composing causal probabilistic programs with non-parametric Bayes in under 50 lines of probabilistic code. Second, for several representative data analysis tasks, we report on lines of code and accuracy measurements of various CGPMs, plus comparisons with standard baseline solutions from Python and MATLAB libraries.

artificial intelligence, cgpm, machine learning, (19 more...)

arXiv.org Machine Learning

1608.05347

Country:

Europe (0.92)
Asia (0.67)
North America > United States > Massachusetts (0.28)

Genre: Research Report (0.63)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Molecular Graph Convolutions: Moving Beyond Fingerprints

Kearnes, Steven, McCloskey, Kevin, Berndl, Marc, Pande, Vijay, Riley, Patrick

arXiv.org Machine LearningAug-18-2016

Molecular "fingerprints" encoding structural information are the workhorse of cheminformatics and machine learning in drug discovery applications. However, fingerprint representations necessarily emphasize particular aspects of the molecular structure while ignoring others, rather than allowing the model to make data-driven decisions. We describe molecular "graph convolutions", a machine learning architecture for learning from undirected graphs, specifically small molecules. Graph convolutions use a simple encoding of the molecular graph---atoms, bonds, distances, etc.---which allows the model to take greater advantage of information in the graph structure. Although graph convolutions do not outperform all fingerprint-based methods, they (along with other graph-based methods) represent a new paradigm in ligand-based virtual screening with exciting opportunities for future improvement.

artificial intelligence, machine learning, weave module, (17 more...)

arXiv.org Machine Learning

doi: 10.1007/s10822-016-9938-8

1603.00856

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Application of multiview techniques to NHANES dataset

Omogbai, Aileme

arXiv.org Machine LearningAug-16-2016

Research into disease-related health variables typically involve choosing health variables and conditions, and using statistical methods to study the strength of association of the variables with the condition [9]. These are then used to confirm known or suspected relationships between the behavioural/health factors or disease conditions. There may be information about health status that may be gleaned by considering different aspects of an individual's data, and investigating possible relationships between the variables. Representations that capture these relationships can be useful in predicting presence or risk level of medical conditions. The National Health and Nutrition Examination Survey (NHANES) dataset provides data on health measurements, taken from survey participants, comprising different categories including demographics, laboratory tests and physical measurements.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Machine Learning

1608.04783

Country: North America > United States (0.68)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.69)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.47)

Add feedback

Random forest explained in simple terms - Listen Data

#artificialintelligenceAug-15-2016, 21:46:13 GMT

If omitted, randomForest will run in unsupervised mode. Arguments mtry: number of variables selected at each split - default sqrt(no of variables) for classification ntree: number of trees to grow: default 500 nodesize: minimum size of terminal nodes default 1 Step III: Find the number of trees where the out of bag error rate stabilizes and reach minimum. Step IV: Find the optimal number of variables selected at each split Select mtry value with minimum out of bag(OOB) error. It returns the optimal number of mtry (paramter used in randomforest package).

artificial intelligence, machine learning, random forest, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Why do we fall for false positives even though they're common?

New ScientistAug-14-2016, 00:39:44 GMT

Last month, the drinking water in a Colorado town was declared unsafe, because it had been contaminated by an ingredient from cannabis. It took two days to discover that this was not the case – a water test had turned up a false positive result. In fact, false positives are widespread in our everyday lives, and we seem to have an innate inability to get to grips with them. The fuss in Hugo, Colorado – a state where cannabis use is now legal – began when a county employee administering a test for drug use decided to use the same kind of test on tap water, rather than saliva, in an attempt to rule out a false positive. When the water tested positive too, it was assumed the test kit was a dud.

artificial intelligence, false positive, machine learning, (13 more...)

New Scientist

Country: North America > United States > Colorado (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.78)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (0.77)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Does quantification without adjustments work?

Tasche, Dirk

arXiv.org Machine LearningAug-12-2016

Classification is the task of predicting the class labels of objects based on the observation of their features. In contrast, quantification has been defined as the task of determining the prevalences of the different sorts of class labels in a target dataset. The simplest approach to quantification is Classify & Count where a classifier is optimised for classification on a training set and applied to the target dataset for the prediction of class labels. In the case of binary quantification, the number of predicted positive labels is then used as an estimate of the prevalence of the positive class in the target dataset. Since the performance of Classify & Count for quantification is known to be inferior its results typically are subject to adjustments. However, some researchers recently have suggested that Classify & Count might actually work without adjustments if it is based on a classifer that was specifically trained for quantification. We discuss the theoretical foundation for this claim and explore its potential and limitations with a numerical example based on the binormal model with equal variances. In order to identify an optimal quantifier in the binormal setting, we introduce the concept of local Bayes optimality. As a side remark, we present a complete proof of a theorem by Ye et al. (2012).

artificial intelligence, classifier, machine learning, (14 more...)

arXiv.org Machine Learning

1602.0878

Country:

North America > United States (0.46)
Europe > Austria (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Epidemiology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Kernel Ridge Regression via Partitioning

Tandon, Rashish, Si, Si, Ravikumar, Pradeep, Dhillon, Inderjit

arXiv.org Machine LearningAug-5-2016

In this paper, we investigate a divide and conquer approach to Kernel Ridge Regression (KRR). Given n samples, the division step involves separating the points based on some underlying disjoint partition of the input space (possibly via clustering), and then computing a KRR estimate for each partition. The conquering step is simple: for each partition, we only consider its own local estimate for prediction. We establish conditions under which we can give generalization bounds for this estimator, as well as achieve optimal minimax rates. We also show that the approximation error component of the generalization error is lesser than when a single KRR estimate is fit on the data: thus providing both statistical and computational advantages over a single KRR estimate over the entire data (or an averaging over random partitions as in other recent work, [30]). Lastly, we provide experimental validation for our proposed estimator and our assumptions.

artificial intelligence, machine learning, partition, (13 more...)

arXiv.org Machine Learning

1608.01976

Country: North America > United States > Texas (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)

Add feedback

Classification with Asymmetric Label Noise: Consistency and Maximal Denoising

Blanchard, Gilles, Flaska, Marek, Handy, Gregory, Pozzi, Sara, Scott, Clayton

arXiv.org Machine LearningAug-5-2016

In many real-world classification problems, the labels of training examples are randomly corrupted. Most previous theoretical work on classification with label noise assumes that the two classes are separable, that the label noise is independent of the true class label, or that the noise proportions for each class are known. In this work, we give conditions that are necessary and sufficient for the true class-conditional distributions to be identifiable. These conditions are weaker than those analyzed previously, and allow for the classes to be nonseparable and the noise levels to be asymmetric and unknown. The conditions essentially state that a majority of the observed labels are correct and that the true class-conditional distributions are "mutually irreducible," a concept we introduce that limits the similarity of the two distributions. For any label noise problem, there is a unique pair of true class-conditional distributions satisfying the proposed conditions, and we argue that this pair corresponds in a certain sense to maximal denoising of the observed distributions. Our results are facilitated by a connection to "mixture proportion estimation," which is the problem of estimating the maximal proportion of one distribution that is present in another. We establish a novel rate of convergence result for mixture proportion estimation, and apply this to obtain consistency of a discrimination rule based on surrogate loss minimization. Experimental results on benchmark data and a nuclear particle classification problem demonstrate the efficacy of our approach.

artificial intelligence, inductive learning, machine learning, (17 more...)

arXiv.org Machine Learning

1303.1208

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.86)

Add feedback

Multiple Instance Dictionary Learning using Functions of Multiple Instances

Jiao, Changzhe, Zare, Alina

arXiv.org Machine LearningAug-3-2016

A multiple instance dictionary learning method using functions of multiple instances (DL-FUMI) is proposed to address target detection and two-class classification problems with inaccurate training labels. Given inaccurate training labels, DL-FUMI learns a set of target dictionary atoms that describe the most distinctive and representative features of the true positive class as well as a set of nontarget dictionary atoms that account for the shared information found in both the positive and negative instances. Experimental results show that the estimated target dictionary atoms found by DL-FUMI are more representative prototypes and identify better discriminative features of the true positive class than existing methods in the literature. DL-FUMI is shown to have significantly better performance on several target detection and classification problems as compared to other multiple instance learning (MIL) dictionary learning algorithms on a variety of MIL problems.

artificial intelligence, dl-fumi, machine learning, (15 more...)

arXiv.org Machine Learning

1511.02825

Country: North America > United States (0.30)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

Assessing Functional Neural Connectivity as an Indicator of Cognitive Performance

Helfer, Brian S., Williamson, James R., Miller, Benjamin A., Perricone, Joseph, Quatieri, Thomas F.

arXiv.org Machine LearningJul-29-2016

Studies in recent years have demonstrated that neural organization and structure impact an individual's ability to perform a given task. Specifically, individuals with greater neural efficiency have been shown to outperform those with less organized functional structure. In this work, we compare the predictive ability of properties of neural connectivity on a working memory task. We provide two novel approaches for characterizing functional network connectivity from electroencephalography (EEG), and compare these features to the average power across frequency bands in EEG channels. Our first novel approach represents functional connectivity structure through the distribution of eigenvalues making up channel coherence matrices in multiple frequency bands. Our second approach creates a connectivity network at each frequency band, and assesses variability in average path lengths of connected components and degree across the network. Failures in digit and sentence recall on single trials are detected using a Gaussian classifier for each feature set, at each frequency band. The classifier results are then fused across frequency bands, with the resulting detection performance summarized using the area under the receiver operating characteristic curve (AUC) statistic.

artificial intelligence, frequency band, machine learning, (13 more...)

arXiv.org Machine Learning

1607.08891

Country: North America > United States > Massachusetts > Middlesex County (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.72)
Government > Regional Government > North America Government > United States Government (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Add feedback