AITopics

Country: North America > United States > Wisconsin (0.04)

Industry: Health & Medicine > Therapeutic Area (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.32)

#artificialintelligenceAug-27-2019, 05:13:17 GMT

Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms

Given a data set D containing millions of data points and a data consumer who is willing to pay for X to train a machine learning (ML) model over D, how should we distribute this X to each data point to reflect its "value"? In this paper, we define the "relative value of data" via the Shapley value, as it uniquely possesses properties with appealing real-world interpretations, such as fairness, rationality and decentralizability. For general, bounded utility functions, the Shapley value is known to be challenging to compute: to get Shapley values for all N data points, it requires O(2 N) model evaluations for exact computation and O(N N) for (ϵ, δ)-approximation. In this paper, we focus on one popular family of ML models relying on K-nearest neighbors (KNN). The most surprising result is that for unweighted KNN classifiers and regressors, the Shapley value of all N data points can be computed, exactly, in O(N N) time -- an exponential improvement on computational complexity!

algorithm, artificial intelligence, machine learning, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

#artificialintelligenceAug-27-2019, 05:13:11 GMT

Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms

Given a data set D containing millions of data points and a data consumer who is willing to pay for X to train a machine learning (ML) model over D, how should we distribute this X to each data point to reflect its "value"? In this paper, we define the "relative value of data" via the Shapley value, as it uniquely possesses properties with appealing real-world interpretations, such as fairness, rationality and decentralizability. For general, bounded utility functions, the Shapley value is known to be challenging to compute: to get Shapley values for all N data points, it requires O(2 N) model evaluations for exact computation and O(N N) for (ϵ, δ)-approximation. In this paper, we focus on one popular family of ML models relying on K-nearest neighbors (KNN). The most surprising result is that for unweighted KNN classifiers and regressors, the Shapley value of all N data points can be computed, exactly, in O(N N) time -- an exponential improvement on computational complexity!

algorithm, artificial intelligence, machine learning, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

arXiv.org Machine LearningAug-22-2019

Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms

Jia, Ruoxi, Dao, David, Wang, Boxin, Hubis, Frances Ann, Gurel, Nezihe Merve, Li, Bo, Zhang, Ce, Spanos, Costas J., Song, Dawn

Given a data set $\mathcal{D}$ containing millions of data points and a data consumer who is willing to pay for \$$X$ to train a machine learning (ML) model over $\mathcal{D}$, how should we distribute this \$$X$ to each data point to reflect its "value"? In this paper, we define the "relative value of data" via the Shapley value, as it uniquely possesses properties with appealing real-world interpretations, such as fairness, rationality and decentralizability. For general, bounded utility functions, the Shapley value is known to be challenging to compute: to get Shapley values for all $N$ data points, it requires $O(2^N)$ model evaluations for exact computation and $O(N\log N)$ for $(\epsilon, \delta)$-approximation. In this paper, we focus on one popular family of ML models relying on $K$-nearest neighbors ($K$NN). The most surprising result is that for unweighted $K$NN classifiers and regressors, the Shapley value of all $N$ data points can be computed, exactly, in $O(N\log N)$ time -- an exponential improvement on computational complexity! Moreover, for $(\epsilon, \delta)$-approximation, we are able to develop an algorithm based on Locality Sensitive Hashing (LSH) with only sublinear complexity $O(N^{h(\epsilon,K)}\log N)$ when $\epsilon$ is not too small and $K$ is not too large. We empirically evaluate our algorithms on up to $10$ million data points and even our exact algorithm is up to three orders of magnitude faster than the baseline approximation algorithm. The LSH-based approximation algorithm can accelerate the value calculation process even further. We then extend our algorithms to other scenarios such as (1) weighed $K$NN classifiers, (2) different data points are clustered by different data curators, and (3) there are data analysts providing computation who also requires proper valuation.

artificial intelligence, data mining, machine learning, (17 more...)

1908.08619

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.92)
Leisure & Entertainment > Games (0.46)
Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

arXiv.org Machine LearningAug-6-2019

Predicted disease compositions of human gliomas estimated from multiparametric MRI can predict endothelial proliferation, tumor grade, and overall survival

Diller, Emily E, Cao, Sha, Ey, Beth, Lober, Robert, Parker, Jason G

Background and Purpose: Biopsy is the main determinants of glioma clinical management, but require invasive sampling that fail to detect relevant features because of tumor heterogeneity. The purpose of this study was to evaluate the accuracy of a voxel-wise, multiparametric MRI radiomic method to predict features and develop a minimally invasive method to objectively assess neoplasms. Methods: Multiparametric MRI were registered to T1-weighted gadolinium contrast-enhanced data using a 12 degree-of-freedom affine model. The retrospectively collected MRI data included T1-weighted, T1-weighted gadolinium contrast-enhanced, T2-weighted, fluid attenuated inversion recovery, and multi-b-value diffusion-weighted acquired at 1.5T or 3.0T. Clinical experts provided voxel-wise annotations for five disease states on a subset of patients to establish a training feature vector of 611,930 observations. Then, a k-nearest-neighbor (k-NN) classifier was trained using a 25% hold-out design. The trained k-NN model was applied to 13,018,171 observations from seventeen histologically confirmed glioma patients. Linear regression tested overall survival (OS) relationship to predicted disease compositions (PDC) and diagnostic age (alpha = 0.05). Canonical discriminant analysis tested if PDC and diagnostic age could differentiate clinical, genetic, and microscopic factors (alpha = 0.05). Results: The model predicted voxel annotation class with a Dice similarity coefficient of 94.34% +/- 2.98. Linear combinations of PDCs and diagnostic age predicted OS (p = 0.008), grade (p = 0.014), and endothelia proliferation (p = 0.003); but fell short predicting gene mutations for TP53BP1 and IDH1. Conclusions: This voxel-wise, multi-parametric MRI radiomic strategy holds potential as a non-invasive decision-making aid for clinicians managing patients with glioma.

artificial intelligence, glioma, machine learning, (17 more...)

1908.02334

Country: North America > United States > Indiana (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Yesilli, Melih C., Khasawneh, Firas A., Otto, Andreas

Chatter Detection in Turning Using Machine Learning and Similarity Measures of Time Series via Dynamic Time Warping

arXiv.org Machine LearningAug-5-2019

Chatter detection from sensor signals has been an active field of research. While some success has been reported using several featurization tools and machine learning algorithms, existing methods have several drawbacks such as manual preprocessing and requiring a large data set. In this paper, we present an alternative approach for chatter detection based on K-Nearest Neighbor (kNN) algorithm for classification and the Dynamic Time Warping (DTW) as a time series similarity measure. The used time series are the acceleration signals acquired from the tool holder in a series of turning experiments. Our results, show that this approach achieves detection accuracies that in most cases outperform existing methods. We compare our results to the traditional methods based on Wavelet Packet Transform (WPT) and the Ensemble Empirical Mode Decomposition (EEMD), as well as to the more recent Topological Data Analysis (TDA) based approach. We show that in three out of four cutting configurations our DTW-based approach attains the highest average classification rate reaching in one case as high as 99% accuracy. Our approach does not require feature extraction, is capable of reusing a classifier across different cutting configurations, and it uses reasonably sized training sets. Although the resulting high accuracy in our approach is associated with high computational cost, this is specific to the DTW implementation that we used. Specifically, we highlight available, very fast DTW implementations that can even be implemented on small consumer electronics. Therefore, further code optimization and the significantly reduced computational effort during the implementation phase make our approach a viable option for in-process chatter detection.

chatter, classifier, time sery, (11 more...)

1908.01678

Country:

North America > United States > Michigan (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.88)

#artificialintelligenceJul-23-2019, 04:56:13 GMT

K Nearest Neighbor Algorithm In Python - Towards Data Science

K-Nearest Neighbors, or KNN for short, is one of the simplest machine learning algorithms and is used in a wide array of institutions. KNN is a non-parametric, lazy learning algorithm. When we say a technique is non-parametric, it means that it does not make any assumptions about the underlying data. In other words, it makes its selection based off of the proximity to other data points regardless of what feature the numerical values represent. Being a lazy learning algorithm implies that there is little to no training phase.

artificial intelligence, machine learning, new data, (14 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Bertsimas, Dimitris, McCord, Christopher, Sturt, Bradley

Dynamic optimization with side information

arXiv.org Machine LearningJul-16-2019

We present a data-driven framework for incorporating side information in dynamic optimization under uncertainty. Specifically, our approach uses predictive machine learning methods (such as k-nearest neighbors, kernel regression, and random forests) to weight the relative importance of various data-driven uncertainty sets in a robust optimization formulation. Through a novel measure concentration result for local machine learning methods, we prove that the proposed framework is asymptotically optimal for stochastic dynamic optimization with covariates. We also describe a general-purpose approximation for the proposed framework, based on overlapping linear decision rules, which is computationally tractable and produces high-quality solutions for dynamic problems with many stages. Across a variety of examples in shipment planning, inventory management, and finance, our method achieves improvements of up to 15% over alternatives and requires less than one minute of computation time on problems with twelve stages.

artificial intelligence, machine learning, optimization, (16 more...)

1907.07307

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.56)

Ortiz-Bejar, Jose, Tellez, Eric S., Graff, Mario

Feature space transformations and model selection to improve the performance of classifiers

arXiv.org Machine LearningJul-14-2019

Improving the performance of classifiers is the realm of prototype selection and kernel transformations. Prototype selection has been used to reduce the space complexity of k-Nearest Neighbors classifiers and to improve its accuracy, and kernel transformations enhanced the performance of linear classifiers by converting a non-linear separable problem into a linear one in the transformed space. Our proposal combines, in a model selection scheme, these transformations with classic algorithms such as Na\"ive Bayes and k-Nearest Neighbors to produce a competitive classifier. We analyzed our approach on different classification problems and compared it to state-of-the-art classifiers. The results show that the methodology proposed is competitive, obtaining the lowest rank among the classifiers being compared.

benchmark, classifier, selection, (14 more...)

1907.06258

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Mexico > Michoacán (0.04)
North America > Mexico > Aguascalientes (0.04)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

#artificialintelligenceJul-11-2019, 08:12:13 GMT

Implement K-Nearest Neighbors classification Algorithm

I have written this post for the developers and assumes no background in statistics or mathematics. The focus is mainly on how the k-NN algorithm works and how to use it for predictive modeling problems. Classification of objects is an important area of research and application in a variety of fields. In the presence of full knowledge of the underlying probabilities, Bayes decision theory gives optimal error rates. In those cases where this information is not present, many algorithms make use of distance or similarity among samples as a means of classification.

algorithm, artificial intelligence, machine learning, (14 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)