AITopics | Nearest Neighbor Methods

Collaborating Authors

Nearest Neighbor Methods

News Overviews Instructional Materials AI-Alerts Classics

Local Distance Metric Learning for Nearest Neighbor Algorithm

Rajabzadeh, Hossein, Jahromi, Mansoor Zolghadri, Zare, Mohammad Sadegh, Fakhrahmad, Mostafa

arXiv.org Machine LearningMar-15-2018

Distance metric learning is a successful way to enhance the performance of the nearest neighbor classifier. In most cases, however, the distribution of data does not obey a regular form and may change in different parts of the feature space. Regarding that, this paper proposes a novel local distance metric learning method, namely Local Mahalanobis Distance Learning (LMDL), in order to enhance the performance of the nearest neighbor classifier. LMDL considers the neighborhood influence and learns multiple distance metrics for a reduced set of input samples. The reduced set is called as prototypes which try to preserve local discriminative information as much as possible. The proposed LMDL can be kernelized very easily, which is significantly desirable in the case of highly nonlinear data. The quality as well as the efficiency of the proposed method assesses through a set of different experiments on various datasets and the obtained results show that LDML as well as the kernelized version is superior to the other related state-of-the-art methods.

artificial intelligence, educational setting, prototype, (17 more...)

arXiv.org Machine Learning

1803.01562

Genre: Research Report > New Finding (0.66)

Industry: Education > Educational Setting > Online (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.90)

Add feedback

Introduction to k-Nearest-Neighbors – Towards Data Science

#artificialintelligenceMar-13-2018, 23:18:57 GMT

Once we have formed our training data-set, which is represented as an M x N matrix where M is the number of data points and N is the number of features, we can now begin classifying. There are two important decisions that must be made before making classifications. One is the value of k that will be used; this can either be decided arbitrarily, or you can try cross-validation to find an optimal value. The next, and the most complex, is the distance metric that will be used. There are many different ways to compute distance, as it is a fairly ambiguous notion, and the proper metric to use is always going to be determined by the data-set and the classification task.

algorithm, artificial intelligence, machine learning, (7 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.85)

Add feedback

Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning

Papernot, Nicolas, McDaniel, Patrick

arXiv.org Machine LearningMar-13-2018

Deep neural networks (DNNs) enable innovative applications of machine learning like image recognition, machine translation, or malware detection. However, deep learning is often criticized for its lack of robustness in adversarial settings (e.g., vulnerability to adversarial inputs) and general inability to rationalize its predictions. In this work, we exploit the structure of deep learning to enable new learning-based inference and decision strategies that achieve desirable properties such as robustness and interpretability. We take a first step in this direction and introduce the Deep k-Nearest Neighbors (DkNN). This hybrid classifier combines the k-nearest neighbors algorithm with representations of the data learned by each layer of the DNN: a test input is compared to its neighboring training points according to the distance that separates them in the representations. We show the labels of these neighboring points afford confidence estimates for inputs outside the model's training manifold, including on malicious inputs like adversarial examples--and therein provides protections against inputs that are outside the models understanding. This is because the nearest neighbors can be used to estimate the nonconformity of, i.e., the lack of support for, a prediction in the training data. The neighbors also constitute human-interpretable explanations of predictions. We evaluate the DkNN algorithm on several datasets, and show the confidence estimates accurately identify inputs outside the model, and that the explanations provided by nearest neighbors are intuitive and useful in understanding model failures.

deep learning, neural network, prediction, (19 more...)

arXiv.org Machine Learning

1803.04765

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Add feedback

The K-Nearest Neighbour UCB algorithm for multi-armed bandits with covariates

Reeve, Henry WJ, Mellor, Joe, Brown, Gavin

arXiv.org Machine LearningMar-1-2018

In this paper we propose and explore the k-Nearest Neighbour UCB algorithm for multi-armed bandits with covariates. We focus on a setting where the covariates are supported on a metric space of low intrinsic dimension, such as a manifold embedded within a high dimensional ambient feature space. The algorithm is conceptually simple and straightforward to implement. The k-Nearest Neighbour UCB algorithm does not require prior knowledge of the either the intrinsic dimension of the marginal distribution or the time horizon. We prove a regret bound for the k-Nearest Neighbour UCB algorithm which is minimax optimal up to logarithmic factors. In particular, the algorithm automatically takes advantage of both low intrinsic dimensionality of the marginal distribution over the covariates and low noise in the data, expressed as a margin condition. In addition, focusing on the case of bounded rewards, we give corresponding regret bounds for the k-Nearest Neighbour KL-UCB algorithm, which is an analogue of the KL-UCB algorithm adapted to the setting of multi-armed bandits with covariates. Finally, we present empirical results which demonstrate the ability of both the k-Nearest Neighbour UCB and k-Nearest Neighbour KL-UCB to take advantage of situations where the data is supported on an unknown sub-manifold of a high-dimensional feature space.

algorithm, artificial intelligence, health & medicine, (16 more...)

arXiv.org Machine Learning

1803.00316

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback

Dimensionality reduction for acoustic vehicle classification with spectral embedding

Sunu, Justin, Percus, Allon G.

arXiv.org Machine LearningFeb-17-2018

We propose a method for recognizing moving vehicles, using data from roadside audio sensors. This problem has applications ranging widely, from traffic analysis to surveillance. We extract a frequency signature from the audio signal using a short-time Fourier transform, and treat each time window as an individual data point to be classified. By applying a spectral embedding, we decrease the dimensionality of the data sufficiently for K-nearest neighbors to provide accurate vehicle identification.

artificial intelligence, ground transportation, vehicle, (19 more...)

arXiv.org Machine Learning

1705.09869

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Industry:

Automobiles & Trucks (0.69)
Transportation > Ground > Road (0.47)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.41)

Add feedback

Revisiting the Vector Space Model: Sparse Weighted Nearest-Neighbor Method for Extreme Multi-Label Classification

Aoshima, Tatsuhiro, Kobayashi, Kei, Minami, Mihoko

arXiv.org Machine LearningFeb-12-2018

Machine learning has played an important role in information retrieval (IR) in recent times. In search engines, for example, query keywords are accepted and documents are returned in order of relevance to the given query; this can be cast as a multi-label ranking problem in machine learning. Generally, the number of candidate documents is extremely large (from several thousand to several million); thus, the classifier must handle many labels. This problem is referred to as extreme multi-label classification (XMLC). In this paper, we propose a novel approach to XMLC termed the Sparse Weighted Nearest-Neighbor Method. This technique can be derived as a fast implementation of state-of-the-art (SOTA) one-versus-rest linear classifiers for very sparse datasets. In addition, we show that the classifier can be written as a sparse generalization of a representer theorem with a linear kernel. Furthermore, our method can be viewed as the vector space model used in IR. Finally, we show that the Sparse Weighted Nearest-Neighbor Method can process data points in real time on XMLC datasets with equivalent performance to SOTA models, with a single thread and smaller storage footprint. In particular, our method exhibits superior performance to the SOTA models on a dataset with 3 million labels.

artificial intelligence, dataset, machine learning, (14 more...)

arXiv.org Machine Learning

1802.03938

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.73)

Add feedback

Learning Tree-based Deep Model for Recommender Systems

Zhu, Han, Zhang, Pengye, Li, Guozheng, He, Jie, Li, Han, Gai, Kun

arXiv.org Machine LearningFeb-12-2018

Model-based methods for recommender systems have been studied to provide more precise results. In systems with large corpus, the amount of calculation for learnt model to predict all user-item pairs' preferences is tremendous, which makes the model difficult to be directly employed in recommendation candidate generation stage. To overcome the calculation barrier, models like matrix factorization can resort to inner product form (i.e., use the inner product of user and item's latent factors as the preference) and index like hashing to perform efficient approximate k-nearest neighbor search. However, other more expressive interaction forms between user and item features, e.g., interactions through advanced deep neural networks, are still prevented from large corpus recommendation because of the amount of calculation. In this paper, we focus on the problem how arbitrary advanced models can be introduced to generate recommendations from large corpus. We propose a novel tree-based method which can provide logarithmic complexity prediction w.r.t. corpus size with more expressive deep neural networks. The main idea of tree-based model is to predict user interests coarse-to-fine, by traversing tree nodes top-down and making decisions whether to pick up each node to user. Furthermore, we show that the tree structure can also be jointly learnt towards better compatible with user interests' distribution, to facilitate both training and prediction. Experiments in two large-scale real-world datasets indicate that the proposed model significantly outperforms traditional methods. And online A/B test results in Taobao display advertising platform prove the effectiveness of the tree-based deep model in production.

deep learning, neural network, node, (19 more...)

arXiv.org Machine Learning

1801.02294

Genre: Research Report > New Finding (0.46)

Industry:

Marketing (0.35)
Information Technology > Services (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback

ATPboost: Learning Premise Selection in Binary Setting with ATP Feedback

Piotrowski, Bartosz, Urban, Josef

arXiv.org Machine LearningFeb-9-2018

ATPboost is a system for solving sets of large-theory problems by interleaving ATP runs with state-of-the-art machine learning of premise selection from the proofs. Unlike many previous approaches that use multi-label setting, the learning is implemented as binary classification that estimates the pairwise-relevance of (theorem, premise) pairs. ATPboost uses for this the XGBoost gradient boosting algorithm, which is fast and has state-of-the-art performance on many tasks. Learning in the binary setting however requires negative examples, which is nontrivial due to many alternative proofs. We discuss and implement several solutions in the context of the ATP/ML feedback loop, and show that ATPboost with such methods significantly outperforms the k-nearest neighbors multilabel classifier.

artificial intelligence, machine learning, theorem, (16 more...)

arXiv.org Machine Learning

1802.03375

Country:

Oceania > Fiji (0.14)
Europe > Poland (0.14)
Europe > Czechia (0.14)
Africa > Botswana (0.14)

Genre: Research Report (0.51)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.55)

Add feedback

Doubly Approximate Nearest Neighbor Classification

Liu, Weiwei (The University of New South Wales) | Liu, Zhuanghua (University of Technology Sydney) | Tsang, Ivor W. (University of Technology Sydney) | Zhang, Wenjie (The University of New South Wales) | Lin, Xuemin (The University of New South Wales)

AAAI ConferencesFeb-8-2018

Nonparametric classification models, such as K-Nearest Neighbor (KNN), have become particularly powerful tools in machine learning and data mining, due to their simplicity and flexibility. However, the testing time of the KNN classifier becomes unacceptable and the KNN's performance deteriorates significantly when applied to data sets with millions of dimensions. We observe that state-of-the-art approximate nearest neighbor (ANN) methods aim to either reduce the number of distance comparisons based on tree structure or decrease the cost of distance computation by dimension reduction methods. In this paper, we propose a doubly approximate nearest neighbor classification strategy, which marries the two branches which compress the dimensions for decreasing distance computation cost as well as reduce the number of distance comparison instead of full scan. Under this strategy, we build a compressed dimensional tree (CD-Tree) to avoid unnecessary distance calculations. In each decision node, we propose a novel feature selection paradigm by optimizing the feature selection vector as well as the separator (indicator variables for splitting instances) with the maximum margin. An efficient algorithm is then developed to find the globally optimal solution with convergence guarantee. Furthermore, we also provide a data-dependent generalization error bound for our model, which reveals a new insight for the design of ANN classification algorithms. Our empirical studies show that our algorithm consistently obtains competitive or better classification results on all data sets, yet we can also achieve three orders of magnitude faster than state-of-the-art libraries on very high dimensions.

algorithm, artificial intelligence, dimension, (14 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.34)

Add feedback

Bayesian Optimization with Gradients

Wu, Jian, Poloczek, Matthias, Wilson, Andrew Gordon, Frazier, Peter I.

arXiv.org Artificial IntelligenceFeb-5-2018, 18:00:00 GMT

Bayesian optimization has been successful at global optimization of expensive-to-evaluate multimodal objective functions. However, unlike most optimization methods, Bayesian optimization typically does not use derivative information. In this paper we show how Bayesian optimization can exploit derivative information to decrease the number of objective function evaluations required for good performance. In particular, we develop a novel Bayesian optimization algorithm, the derivative-enabled knowledge-gradient (dKG), for which we show one-step Bayes-optimality, asymptotic consistency, and greater one-step value of information than is possible in the derivative-free setting. Our procedure accommodates noisy and incomplete derivative information, comes in both sequential and batch forms, and can optionally reduce the computational cost of inference through automatically selected retention of a single directional derivative. We also compute the d-KG acquisition function and its gradient using a novel fast discretization-free technique. We show d-KG provides state-of-the-art performance compared to a wide range of optimization procedures with and without gradients, on benchmarks including logistic regression, deep learning, kernel learning, and k-nearest neighbors.

deep learning, neural network, optimization, (16 more...)

arXiv.org Artificial Intelligence

1703.04389

Country:

North America > United States > Oregon (0.14)
North America > Canada > Alberta (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback