AITopics | Accuracy

Collaborating Authors

Accuracy

News Overviews Instructional Materials AI-Alerts Classics

How web search data might help diagnose serious illness earlier - Next at Microsoft

@machinelearnbotJun-10-2016, 18:39:10 GMT

Early diagnosis is key to gaining the upper hand against a wide range of diseases. Now Microsoft researchers are suggesting that records of the topics that people search for on the Internet could one day prove as useful as an X-ray or MRI in detecting some illnesses before it's too late. The potential of using engagement with search engines to predict an eventual diagnosis – and possibly buy critical time for a medical response -- is demonstrated in a new study by Microsoft researchers Eric Horvitz and Ryen White, along with former Microsoft intern and Columbia University doctoral candidate John Paparrizos. In a paper published Tuesday in the Journal of Oncology Practice, the trio detailed how they used anonymized Bing search logs to identify people whose queries provided strong evidence that they had recently been diagnosed with pancreatic cancer – a particularly deadly and fast-spreading cancer that is frequently caught too late to cure. Then they retroactively analyzed searches for symptoms of the disease over many months prior to identify patterns of queries most likely to signal an eventual diagnosis.

information retrieval, machine learning, natural language, (18 more...)

@machinelearnbot

Country: North America > United States > Washington > King County > Redmond (0.06)

Genre: Research Report (0.36)

Industry: Health & Medicine > Therapeutic Area > Oncology > Pancreatic Cancer (0.41)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.31)

Add feedback

US Patent Application for Face Detection Using Machine Learning Patent Application (Application #20160140436 issued May 19, 2016) - Justia Patents Search

#artificialintelligenceJun-7-2016, 18:25:55 GMT

This invention relates generally to image processing and, more particularly, to object detection using machine learning. Face detection systems perform image processing on digital images or video frames to automatically identify people. In one approach, face detection systems classify images into positive images that contain faces and negative images without any faces. Face detection systems may train neural network for detecting faces and separating the faces from backgrounds. By separating faces from backgrounds, face detection systems may determine whether images contain faces. A good face detection system should have a low rate of false positive detection (i.e., erroneously detecting a negative image as a positive image) and a high rate of true positive detection (i.e. Face detection remains challenging because the number of positive images and negative images available for training typically are not balanced. For example, there may be many more negative images than positive images, and the neural network may be trained in a biased manner with too many negative images. As a result, the neural network trained with the imbalance number of positive and negative samples may suffer from low accuracy in face detection with high false positive detection rate or low true positive detection rate. Face detection also remains challenging because facial appearance may be irregular with large variance. For example, faces may be deformed because of subjects having varying poses or expressions. In addition, faces may be deformed by external settings such as lighting conditions, occlusions, etc. As a result, neural network may fail to distinguish faces from backgrounds and cause a high false positive detection rate. Thus, there is a need for good approaches to accurate face detection and detection of other objects.

artificial intelligence, machine learning, training module 130, (12 more...)

#artificialintelligence

Industry: Law > Intellectual Property & Technology Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Expected Similarity Estimation for Large-Scale Batch and Streaming Anomaly Detection

Schneider, Markus, Ertel, Wolfgang, Ramos, Fabio

arXiv.org Artificial IntelligenceJun-6-2016

We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (EXPoSE), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with EXPoSE can be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, EXPoSE can make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s10994-016-5567-7

1601.06602

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology (0.68)
Health & Medicine (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Trend Filtering on Graphs

Wang, Yu-Xiang, Sharpnack, James, Smola, Alex, Tibshirani, Ryan J.

arXiv.org Artificial IntelligenceJun-4-2016

We introduce a family of adaptive estimators on graphs, based on penalizing the $\ell_1$ norm of discrete graph differences. This generalizes the idea of trend filtering [Kim et al. (2009), Tibshirani (2014)], used for univariate nonparametric regression, to graphs. Analogous to the univariate case, graph trend filtering exhibits a level of local adaptivity unmatched by the usual $\ell_2$-based graph smoothers. It is also defined by a convex minimization problem that is readily solved (e.g., by fast ADMM or Newton algorithms). We demonstrate the merits of graph trend filtering through examples and theory.

artificial intelligence, graph, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1410.769

Country: North America > United States > California (0.92)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications (0.69)
(2 more...)

Add feedback

A Sharp Bound on the Computation-Accuracy Tradeoff for Majority Voting Ensembles

Lopes, Miles E.

arXiv.org Machine LearningJun-3-2016

When random forests are used for binary classification, an ensemble of $t=1,2,\dots$ randomized classifiers is generated, and the predictions of the classifiers are aggregated by majority vote. Due to the randomness in the algorithm, there is a natural tradeoff between statistical performance and computational cost. On one hand, as $t$ increases, the (random) prediction error of the ensemble tends to decrease and stabilize. On the other hand, larger ensembles require greater computational cost for training and making new predictions. The present work offers a new approach for quantifying this tradeoff: Given a fixed training set $\mathcal{D}$, let the random variables $\text{Err}_{t,0}$ and $\text{Err}_{t,1}$ denote the class-wise prediction error rates of a randomly generated ensemble of size $t$. As $t\to\infty$, we provide a general bound on the "algorithmic variance", $\text{var}(\text{Err}_{t,l}|\mathcal{D})\leq \frac{f_l(1/2)^2}{4t}+o(\frac{1}{t})$, where $l\in\{0,1\}$, and $f_l$ is a density function that arises from the ensemble method. Conceptually, this result is somewhat surprising, because $\text{var}(\text{Err}_{t,l}|\mathcal{D})$ describes how $\text{Err}_{t,l}$ varies over repeated runs of the algorithm, and yet, the formula leads to a method for bounding $\text{var}(\text{Err}_{t,l}|\mathcal{D})$ with a single ensemble. The bound is also sharp in the sense that it is attained by an explicit family of randomized classifiers. With regard to the task of estimating $f_l(1/2)$, the presence of the ensemble leads to a unique twist on the classical setup of non-parametric density estimation --- wherein the effects of sample size and computational cost are intertwined. In particular, we propose an estimator for $f_l(1/2)$, and derive an upper bound on its MSE that matches "standard optimal non-parametric rates" when $t$ is sufficiently large.

artificial intelligence, ensemble, machine learning, (17 more...)

arXiv.org Machine Learning

1303.0727

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area (0.98)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.66)

Add feedback

Metrics To Evaluate Machine Learning Algorithms in Python - Machine Learning Mastery

#artificialintelligenceJun-1-2016, 12:16:03 GMT

The metrics that you choose to evaluate your machine learning algorithms are very important. Choice of metrics influences how the performance of machine learning algorithms is measured and compared. They influence how you weight the importance of different characteristics in the results and your ultimate choice of which algorithm to choose. In this post you will discover how to select and use different machine learning performance metrics in Python with scikit-learn. Metrics To Evaluate Machine Learning Algorithms in Python Photo by Ferrous Büller, some rights reserved.

artificial intelligence, machine learning, prediction, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.51)

Add feedback

Machine Learning Has Transformed Many Aspects Of Everyday Life

#artificialintelligenceJun-1-2016, 03:30:42 GMT

For example, it is important to understand how the business will use the model's results. Typically, scores are combined with a single threshold to convert it into a decision procedure (i.e.: fast track applications with scores lower than certain level, assumed to be low risk). To do this, a balance between the true-positives (applications the model correctly classifies as high risk), false-positives (applications the model scores as high risk but are not) and the false-negatives (applications the model scores as low risk but were in fact high risk) is essential. I suggest using ROC curves, including the AUC (area under the curve) as a proxy measure for tuning scoring procedures until a good trade-off is found.

application, artificial intelligence, machine learning, (4 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Singular ridge regression with homoscedastic residuals: generalization error with estimated parameters

Grigoryeva, Lyudmila, Ortega, Juan-Pablo

arXiv.org Machine LearningMay-29-2016

This paper characterizes the conditional distribution properties of the finite sample ridge regression estimator and uses that result to evaluate total regression and generalization errors that incorporate the inaccuracies committed at the time of parameter estimation. The paper provides explicit formulas for those errors. Unlike other classical references in this setup, our results take place in a fully singular setup that does not assume the existence of a solution for the non-regularized regression problem. In exchange, we invoke a conditional homoscedasticity hypothesis on the regularized regression residuals that is crucial in our developments.

generalization error, ridge regression, singular ridge regression, (15 more...)

arXiv.org Machine Learning

1605.09026

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > Switzerland > St. Gallen > Sankt Gallen (0.04)
Europe > Germany (0.04)
Europe > France (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

A New Approach to Building the Interindustry Input--Output Table

Hisano, Ryohei

arXiv.org Machine LearningMay-29-2016

We present a new approach to estimating the interdependence of industries in an economy by applying data science solutions. By exploiting interfirm buyer--seller network data, we show that the problem of estimating the interdependence of industries is similar to the problem of uncovering the latent block structure in network science literature. To estimate the underlying structure with greater accuracy, we propose an extension of the sparse block model that incorporates node textual information and an unbounded number of industries and interactions among them. The latter task is accomplished by extending the well-known Chinese restaurant process to two dimensions. Inference is based on collapsed Gibbs sampling, and the model is evaluated on both synthetic and real-world datasets. We show that the proposed model improves in predictive accuracy and successfully provides a satisfactory solution to the motivated problem. We also discuss issues that affect the future performance of this approach.

data mining, information, machine learning, (20 more...)

arXiv.org Machine Learning

1504.01362

Country:

North America > United States (1.00)
Asia (0.68)

Genre: Research Report (0.40)

Industry:

Information Technology (0.66)
Government > Regional Government (0.46)

Technology:

Information Technology > Data Science > Data Mining (0.93)
Information Technology > Communications > Networks (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
(2 more...)

Add feedback

District Data Labs - Visual Diagnostics for More Informed Machine Learning: Part 3

#artificialintelligenceMay-28-2016, 13:26:34 GMT

Note: Before starting Part 3, be sure to read Part 1 and Part 2! In this final installment of Visual Diagnostics for More Informed Machine Learning, we'll close the loop on visualization tools for navigating the different phases of the machine learning workflow. Recall that we are framing the workflow in terms of the'model selection triple' -- this includes analyzing and selecting features, experimenting with different model forms, and evaluating and tuning fitted models. So far, we've covered methods for visual feature analysis in Part 1 and methods for model family and form exploration in Part 2. This post will cover evaluation and tuning, so we'll begin with two questions: You've probably heard other machine learning practitioners talking about their F1 scores or their R-Squared value. Generally speaking, we do tend to rely on numeric scores to tell us when our models are performing well or poorly. There are a number of measures we can use to evaluate our fitted models.

artificial intelligence, fitted model, machine learning, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback