AITopics | Performance Analysis

Collaborating Authors

Performance Analysis

News Overviews Instructional Materials AI-Alerts Classics

Learning to Plan Chemical Syntheses

Segler, Marwin H. S., Preuss, Mike, Waller, Mark P.

arXiv.org Artificial IntelligenceAug-14-2017

From medicines to materials, small organic molecules are indispensable for human well-being. To plan their syntheses, chemists employ a problem solving technique called retrosynthesis. In retrosynthesis, target molecules are recursively transformed into increasingly simpler precursor compounds until a set of readily available starting materials is obtained. Computer-aided retrosynthesis would be a highly valuable tool, however, past approaches were slow and provided results of unsatisfactory quality. Here, we employ Monte Carlo Tree Search (MCTS) to efficiently discover retrosynthetic routes. MCTS was combined with an expansion policy network that guides the search, and an "in-scope" filter network to pre-select the most promising retrosynthetic steps. These deep neural networks were trained on 12 million reactions, which represents essentially all reactions ever published in organic chemistry. Our system solves almost twice as many molecules and is 30 times faster in comparison to the traditional search method based on extracted rules and hand-coded heuristics. Finally after a 60 year history of computer-aided synthesis planning, chemists can no longer distinguish between routes generated by a computer system and real routes taken from the scientific literature. We anticipate that our method will accelerate drug and materials discovery by assisting chemists to plan better syntheses faster, and by enabling fully automated robot synthesis.

artificial intelligence, expert system, machine learning, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1038/nature25978

1708.04202

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.93)
(3 more...)

Add feedback

Collaborative Filtering using Denoising Auto-Encoders for Market Basket Data

Abad, Andres G., Reyes-Castro, Luis I.

arXiv.org Machine LearningAug-14-2017

Recommender systems (RS) help users navigate large sets of items in the search for "interesting" ones. One approach to RS is Collaborative Filtering (CF), which is based on the idea that similar users are interested in similar items. Most model-based approaches to CF seek to train a machine-learning/data-mining model based on sparse data; the model is then used to provide recommendations. While most of the proposed approaches are effective for small-size situations, the combinatorial nature of the problem makes it impractical for medium-to-large instances. In this work we present a novel approach to CF that works by training a Denoising Auto-Encoder (DAE) on corrupted baskets, i.e., baskets from which one or more items have been removed. The DAE is then forced to learn to reconstruct the original basket given its corrupted input. Due to recent advancements in optimization and other technologies for training neural-network models (such as DAE), the proposed method results in a scalable and practical approach to CF. The contribution of this work is twofold: (1) to identify missing items in observed baskets and, thus, directly providing a CF model; and, (2) to construct a generative model of baskets which may be used, for instance, in simulation analysis or as part of a more complex analytical method.

artificial intelligence, collaborative filtering, machine learning, (12 more...)

arXiv.org Machine Learning

1708.04312

Country: South America > Ecuador (0.14)

Genre: Research Report (1.00)

Industry: Retail (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback

Optimization of Ensemble Supervised Learning Algorithms for Increased Sensitivity, Specificity, and AUC of Population-Based Colorectal Cancer Screenings

Kamath, Anirudh, Singh, Aditya, Ramnani, Raj, Vyas, Ayush, Shenoy, Jay

arXiv.org Machine LearningAug-14-2017

Over 150,000 new people in the United States are diagnosed with colorectal cancer each year. Nearly a third die from it (American Cancer Society). The only approved noninvasive diagnosis tools currently involve fecal blood count tests (FOBTs) or stool DNA tests. Fecal blood count tests take only five minutes and are available over the counter for as low as \$15. They are highly specific, yet not nearly as sensitive, yielding a high percentage (25%) of false negatives (Colon Cancer Alliance). Moreover, FOBT results are far too generalized, meaning that a positive result could mean much more than just colorectal cancer, and could just as easily mean hemorrhoids, anal fissure, proctitis, Crohn's disease, diverticulosis, ulcerative colitis, rectal ulcer, rectal prolapse, ischemic colitis, angiodysplasia, rectal trauma, proctitis from radiation therapy, and others. Stool DNA tests, the modern benchmark for CRC screening, have a much higher sensitivity and specificity, but also cost \$600, take two weeks to process, and are not for high-risk individuals or people with a history of polyps. To yield a cheap and effective CRC screening alternative, a unique ensemble-based classification algorithm is put in place that considers the FIT result, BMI, smoking history, and diabetic status of patients. This method is tested under ten-fold cross validation to have a .95 AUC, 92% specificity, 89% sensitivity, .88 F1, and 90% precision. Once clinically validated, this test promises to be cheaper, faster, and potentially more accurate when compared to a stool DNA test.

artificial intelligence, machine learning, specificity, (12 more...)

arXiv.org Machine Learning

1708.03951

Country: North America > United States > California (0.14)

Genre: Research Report (0.86)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Colorectal Cancer (1.00)
Health & Medicine > Therapeutic Area > Gastroenterology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Semi-supervised emotion lexicon expansion with label propagation and specialized word embeddings

Giulianelli, Mario

arXiv.org Artificial IntelligenceAug-13-2017

There exist two main approaches to automatically extract affective orientation: lexicon-based and corpus-based. In this work, we argue that these two methods are compatible and show that combining them can improve the accuracy of emotion classifiers. In particular, we introduce a novel variant of the Label Propagation algorithm that is tailored to distributed word representations, we apply batch gradient descent to accelerate the optimization of label propagation and to make the optimization feasible for large graphs, and we propose a reproducible method for emotion lexicon expansion. We conclude that label propagation can expand an emotion lexicon in a meaningful way and that the expanded emotion lexicon can be leveraged to improve the accuracy of an emotion classifier.

classifier, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

1708.0391

Country: Europe > Germany (0.28)

Genre: Research Report (0.63)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(6 more...)

Add feedback

Sentiment Analysis by Joint Learning of Word Embeddings and Classifier

Sarma, Prathusha Kameswara, Sethares, Bill

arXiv.org Machine LearningAug-13-2017

Word embeddings are representations of individual words of a text document in a vector space and they are often use- ful for performing natural language pro- cessing tasks. Current state of the art al- gorithms for learning word embeddings learn vector representations from large corpora of text documents in an unsu- pervised fashion. This paper introduces SWESA (Supervised Word Embeddings for Sentiment Analysis), an algorithm for sentiment analysis via word embeddings. SWESA leverages document label infor- mation to learn vector representations of words from a modest corpus of text doc- uments by solving an optimization prob- lem that minimizes a cost function with respect to both word embeddings as well as classification accuracy. Analysis re- veals that SWESA provides an efficient way of estimating the dimension of the word embeddings that are to be learned. Experiments on several real world data sets show that SWESA has superior per- formance when compared to previously suggested approaches to word embeddings and sentiment analysis tasks.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

1708.03995

Genre: Research Report > Experimental Study (0.94)

Industry:

Health & Medicine > Consumer Health (0.68)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
(3 more...)

Add feedback

Making Predictive Models Robust: Holdout vs Cross-Validation

@machinelearnbotAug-11-2017, 22:45:16 GMT

When evaluating machine learning models, the validation step helps you find the best parameters for your model while also preventing it from becoming overfitted. Two of the most popular strategies to perform the validation step are the hold-out strategy and the k-fold strategy. Pros of the hold-out strategy: Fully independent data; only needs to be run once so has lower computational costs. Cons of the hold-out strategy: Performance evaluation is subject to higher variance given the smaller size of the data. K-fold validation evaluates the data across the entire training set, but it does so by dividing the training set into K folds – or subsections – (where K is a positive integer) and then training the model K times, each time leaving a different fold out of the training data and using it instead as a validation set.

artificial intelligence, holdout vs cross-validation, machine learning, (9 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.40)

Add feedback

K-NN algorithm

#artificialintelligenceAug-11-2017, 12:05:37 GMT

Machine learning algorithm K Nearest neighbors (k-NN) uses the principle of classifying data by using nearest neighbors. Nearest neighbors classifiers are defined by their characteristic of classifying unlabeled examples by assigning them the class of similar labeled examples. Despite the simplicity of this approach this method is extremely powerful and has been used for computer vision application, predictions and even, identifying patters in genetic data. The k-NN algorithm gets his name from the fact that uses information about the k-Nearest Neighbors to classify unlabeled examples. The letter is a variable term stating how many numbers of nearest neighbors will be used for the classification.

artificial intelligence, classification, machine learning, (16 more...)

#artificialintelligence

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Why We Need Accountable Algorithms

#artificialintelligenceAug-10-2017, 22:25:18 GMT

AI and machine learning algorithms are marketed as unbiased, objective tools. They are opaque mechanisms of bureaucracy and decisionmaking in which old-fashioned racist, sexist, and classist biases are hidden behind sophisticated technology, usually without a system of appeal. As their influence increases in society, we face a choice. Do we ignore their pernicious effects, or do we understand, regulate, and control the biases they exert? If we want them to represent transparent fairness, freedom, and consistency in an efficient, cost-saving manner, we must hold them accountable somehow.

algorithm, artificial intelligence, machine learning, (17 more...)

#artificialintelligence

Country: North America > United States > California (0.14)

Industry:

Health & Medicine (0.70)
Law (0.69)
Law Enforcement & Public Safety (0.48)

Technology:

Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.32)

Add feedback

Subset Selection with Shrinkage: Sparse Linear Modeling when the SNR is low

Mazumder, Rahul, Radchenko, Peter, Dedieu, Antoine

arXiv.org Machine LearningAug-10-2017

We study the behavior of a fundamental tool in sparse statistical modeling --the best-subset selection procedure (aka "best-subsets"). Assuming that the underlying linear model is sparse, it is well known, both in theory and in practice, that the best-subsets procedure works extremely well in terms of several statistical metrics (prediction, estimation and variable selection) when the signal to noise ratio (SNR) is high. However, its performance degrades substantially when the SNR is low -- it is outperformed in predictive accuracy by continuous shrinkage methods, such as ridge regression and the Lasso. We explain why this behavior should not come as a surprise, and contend that the original version of the classical best-subsets procedure was, perhaps, not designed to be used in the low SNR regimes. We propose a close cousin of best-subsets, namely, its $\ell_{q}$-regularized version, for $q \in\{1, 2\}$, which (a) mitigates, to a large extent, the poor predictive performance of best-subsets in the low SNR regimes; (b) performs favorably and generally delivers a substantially sparser model when compared to the best predictive models available via ridge regression and the Lasso. Our estimator can be expressed as a solution to a mixed integer second order conic optimization problem and, hence, is amenable to modern computational tools from mathematical optimization. We explore the theoretical properties of the predictive capabilities of the proposed estimator and complement our findings via several numerical experiments.

artificial intelligence, estimator, machine learning, (17 more...)

arXiv.org Machine Learning

1708.03288

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.75)

Add feedback

Churn Prediction With Apache Spark Machine Learning - DZone AI

#artificialintelligenceAug-9-2017, 01:50:19 GMT

Churn prediction is big business. It minimizes customer defection by predicting which customers are likely to cancel a subscription to a service. Though originally used within the telecommunications industry, it has become common practice across banks, ISPs, insurance firms, and other verticals. The prediction process is heavily data-driven and often utilizes advanced machine learning techniques. In this post, we'll take a look at what types of customer data are typically used, do some preliminary analysis of the data, and generate churn prediction models -- all with Spark and its machine learning frameworks.

artificial intelligence, customer, machine learning, (13 more...)

#artificialintelligence

Genre: Instructional Material (0.48)

Industry: Telecommunications (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.33)

Add feedback