AITopics

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
North America > United States > California > Monterey County > Pacific Grove (0.04)
North America > Canada > Quebec > Montreal (0.04)
North America > Canada > Alberta (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

To Trust Or Not To Trust A Classifier

Jiang, Heinrich, Kim, Been, Guan, Melody, Gupta, Maya

Knowing when a classifier's prediction can be trusted is useful in many applications and critical for safely using AI. While the bulk of the effort in machine learning research has been towards improving classifier performance, understanding when a classifier's predictions should and should not be trusted has received far less attention. The standard approach is to use the classifier's discriminant or confidence score; however, we show there exists an alternative that is more effective in many situations. We propose a new score, called the {\it trust score}, which measures the agreement between the classifier and a modified nearest-neighbor classifier on the testing example. We show empirically that high (low) trust scores produce surprisingly high precision at identifying correctly (incorrectly) classified examples, consistently outperforming the classifier's confidence score as well as many other baselines. Further, under some mild distributional assumptions, we show that if the trust score for an example is high (low), the classifier will likely agree (disagree) with the Bayes-optimal classifier. Our guarantees consist of non-asymptotic rates of statistical consistency under various nonparametric settings and build on recent developments in topological data analysis.

artificial intelligence, classifier, machine learning, (17 more...)

Country: North America (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Belkin, Mikhail, Hsu, Daniel J., Mitra, Partha

Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate

Many modern machine learning models are trained to achieve zero or near-zero training error in order to obtain near-optimal (but non-zero) test error. This phenomenon of strong generalization performance for ``overfitted'' / interpolated classifiers appears to be ubiquitous in high-dimensional data, having been observed in deep networks, kernel machines, boosting and random forests. Their performance is consistently robust even when the data contain large amounts of label noise. Very little theory is available to explain these observations. The vast majority of theoretical analyses of generalization allows for interpolation only when there is little or no label noise. This paper takes a step toward a theoretical foundation for interpolated classifiers by analyzing local interpolating schemes, including geometric simplicial interpolation algorithm and singularly weighted $k$-nearest neighbor schemes. Consistency or near-consistency is proved for these schemes in classification and regression problems. Moreover, the nearest neighbor schemes exhibit optimal rates under some standard statistical assumptions. Finally, this paper suggests a way to explain the phenomenon of adversarial examples, which are seemingly ubiquitous in modern machine learning, and also discusses some connections to kernel machines and random forests in the interpolated regime.

artificial intelligence, interpolation, machine learning, (15 more...)

Country:

Europe > Switzerland (0.46)
North America > United States (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.48)

Fichtenberger, Hendrik, Rohde, Dennis

A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice

In the $k$-nearest neighborhood model ($k$-NN), we are given a set of points $P$, and we shall answer queries $q$ by returning the $k$ nearest neighbors of $q$ in $P$ according to some metric. This concept is crucial in many areas of data analysis and data processing, e.g., computer vision, document retrieval and machine learning. Many $k$-NN algorithms have been published and implemented, but often the relation between parameters and accuracy of the computed $k$-NN is not explicit. We study property testing of $k$-NN graphs in theory and evaluate it empirically: given a point set $P \subset \mathbb{R}^\delta$ and a directed graph $G=(P,E)$, is $G$ a $k$-NN graph, i.e., every point $p \in P$ has outgoing edges to its $k$ nearest neighbors, or is it $\epsilon$-far from being a $k$-NN graph? Here, $\epsilon$-far means that one has to change more than an $\epsilon$-fraction of the edges in order to make $G$ a $k$-NN graph. We develop a randomized algorithm with one-sided error that decides this question, i.e., a property tester for the $k$-NN property, with complexity $O(\sqrt{n} k^2 / \epsilon^2)$ measured in terms of the number of vertices and edges it inspects, and we prove a lower bound of $\Omega(\sqrt{n / \epsilon k})$. We evaluate our tester empirically on the $k$-NN models computed by various algorithms and show that it can be used to detect $k$-NN models with bad accuracy in significantly less time than the building time of the $k$-NN model.

artificial intelligence, graph, machine learning, (15 more...)

Country:

North America (0.28)
Europe > Germany (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.53)

To Trust Or Not To Trust A Classifier

Jiang, Heinrich, Kim, Been, Guan, Melody, Gupta, Maya

Knowing when a classifier's prediction can be trusted is useful in many applications and critical for safely using AI. While the bulk of the effort in machine learning research has been towards improving classifier performance, understanding when a classifier's predictions should and should not be trusted has received far less attention. The standard approach is to use the classifier's discriminant or confidence score; however, we show there exists an alternative that is more effective in many situations. We propose a new score, called the {\it trust score}, which measures the agreement between the classifier and a modified nearest-neighbor classifier on the testing example. We show empirically that high (low) trust scores produce surprisingly high precision at identifying correctly (incorrectly) classified examples, consistently outperforming the classifier's confidence score as well as many other baselines. Further, under some mild distributional assumptions, we show that if the trust score for an example is high (low), the classifier will likely agree (disagree) with the Bayes-optimal classifier. Our guarantees consist of non-asymptotic rates of statistical consistency under various nonparametric settings and build on recent developments in topological data analysis.

artificial intelligence, classifier, machine learning, (17 more...)

Country: North America (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal

Jiao, Jiantao, Gao, Weihao, Han, Yanjun

We analyze the Kozachenko–Leonenko (KL) fixed k-nearest neighbor estimator for the differential entropy. We obtain the first uniform upper bound on its performance for any fixed k over H\"{o}lder balls on a torus without assuming any conditions on how close the density could be from zero. Accompanying a recent minimax lower bound over the H\"{o}lder ball, we show that the KL estimator for any fixed k is achieving the minimax rates up to logarithmic factors without cognizance of the smoothness parameter s of the H\"{o}lder ball for $s \in (0,2]$ and arbitrary dimension d, rendering it the first estimator that provably satisfies this property.

artificial intelligence, estimator, machine learning, (14 more...)

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.83)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.56)

Plötz, Tobias, Roth, Stefan

Neural Nearest Neighbors Networks

Non-local methods exploiting the self-similarity of natural signals have been well studied, for example in image analysis and restoration. Existing approaches, however, rely on k-nearest neighbors (KNN) matching in a fixed feature space. The main hurdle in optimizing this feature space w.r.t. application performance is the non-differentiability of the KNN selection rule. To overcome this, we propose a continuous deterministic relaxation of KNN selection that maintains differentiability w.r.t. pairwise distances, but retains the original KNN as the limit of a temperature parameter approaching zero. To exploit our relaxation, we propose the neural nearest neighbors block (N3 block), a novel non-local processing layer that leverages the principle of self-similarity and can be used as building block in modern neural network architectures. We show its effectiveness for the set reasoning task of correspondence classification as well as for image restoration, including image denoising and single image super-resolution, where we outperform strong convolutional neural network (CNN) baselines and recent non-local models that rely on KNN selection in hand-chosen features spaces.

artificial intelligence, machine learning, neighbor, (18 more...)

Country: North America (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Machine LearningDec-30-2018

K-nearest Neighbor Search by Random Projection Forests

Yan, Donghui, Wang, Yingjie, Wang, Jin, Wang, Honggang, Li, Zhenpeng

K-nearest neighbor (kNN) search refers to the problem of finding K points closest toa given data point on a distance metric of interest. It is an important task in a wide range of applications, including similarity search in data mining [15,19], fast kernel methods in machine learning [17, 30, 38], nonparametric density estimation [5, 29, 31] and intrinsic dimension estimation [6, 26] in statistics, aswell as anomaly detection algorithms [2, 10, 37]. Numerous algorithms have been proposed for kNN search; the readers are referred to [35, 46] and references therein. Our interest is kNN search in emerging applications. Two 1 salient features of such applications are the expected scalability of the algorithms andtheir ability to handle data of high dimensionality. Additionally, such applications often desire more accurate kNN search. For example, robotic route planning [23] and face-based surveillance systems [34] require a high accuracy forthe robust execution of tasks. However, most existing work on kNN search [1, 4, 12, 15] have focused mainly on the fast computation and accuracy isofalessconcern.

knn search, projection, rptree, (15 more...)

arXiv.org Machine Learning

1812.11689

Country:

North America > United States > Massachusetts > Bristol County > Dartmouth (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Wisconsin (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Siraj-Ud-Doula, Md., Alam, Md. Ashad

Ecological Data Analysis Based on Machine Learning Algorithms

arXiv.org Machine LearningDec-21-2018

Abstract: Classification is an important supervised machine learning method, which is necessary and challenging issue for ecological research. It offers a way to classify a dataset into subsets that share common patterns. Notably, there are many classification algorithms to choose from, each making certain assumptions about the data and about how classification should be formed. In this paper, we applied eight machine learning classification algorithms such as Decision Trees, Random Forest, Artificial Neural Network, Support Vector Machine, Linear Discriminant Analysis, k-nearest neighbors, Logistic Regression and Naive Bayes on ecological data. The goal of this study is to compare different machine learning classification algorithms in ecological dataset. In this analysis we have checked the accuracy test among the algorithms. In our study we conclude that Linear Discriminant Analysis and k-nearest neighbors are the best methods among all other methods.

algorithm, classification, classification algorithm, (13 more...)

arXiv.org Machine Learning

1812.09138

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States (0.04)
Europe > United Kingdom > England > West Sussex (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)