AITopics | Nearest Neighbor Methods

Collaborating Authors

Nearest Neighbor Methods

News Overviews Instructional Materials AI-Alerts Classics

Provably Robust Metric Learning

Wang, Lu, Liu, Xuanqing, Yi, Jinfeng, Jiang, Yuan, Hsieh, Cho-Jui

arXiv.org Machine LearningJun-12-2020

Metric learning has been an important family of machine learning algorithms and has achieved successes on several problems, including computer vision [24, 17, 18], text analysis [27], meta learning [38, 35] and others [34, 45, 47]. Given a set of training samples, metric learning aims to learn a good distance measurement such that items in the same class are closer to each other in the learned metric space, which is crucial for classification and similarity search. Since this objective is directly related to the assumption of nearest neighbor classifiers, most of the metric learning algorithms can be naturally and successfully combined with K-Nearest Neighbor (K-NN) classifiers. Adversarial robustness of machine learning algorithms has been studied extensively in recent years due to the need of robustness guarantees in real world systems. It has been demonstrated that neural networks can be easily attacked by adversarial perturbations in the input space [37, 16, 2], and such perturbations can be computed efficiently in both white-box [4, 29] and black-box settings [7, 19, 9]. Therefore, many defense algorithms have been proposed to improve the robustness of neural networks [26, 29].

artificial intelligence, neural network, robust error, (16 more...)

arXiv.org Machine Learning

2006.07024

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.89)

Add feedback

Smartphone Transportation Mode Recognition Using a Hierarchical Machine Learning Classifier and Pooled Features From Time and Frequency Domains

Ashqar, Huthaifa I., Almannaa, Mohammed H., Elhenawy, Mohammed, Rakha, Hesham A., House, Leanna

arXiv.org Machine LearningJun-12-2020

This paper develops a novel two-layer hierarchical classifier that increases the accuracy of traditional transportation mode classification algorithms. This paper also enhances classification accuracy by extracting new frequency domain features. Many researchers have obtained these features from global positioning system data; however, this data was excluded in this paper, as the system use might deplete the smartphone's battery and signals may be lost in some areas. Our proposed two-layer framework differs from previous classification attempts in three distinct ways: 1) the outputs of the two layers are combined using Bayes' rule to choose the transportation mode with the largest posterior probability; 2) the proposed framework combines the new extracted features with traditionally used time domain features to create a pool of features; and 3) a different subset of extracted features is used in each layer based on the classified modes. Several machine learning techniques were used, including k-nearest neighbor, classification and regression tree, support vector machine, random forest, and a heterogeneous framework of random forest and support vector machine. Results show that the classification accuracy of the proposed framework outperforms traditional approaches. Transforming the time domain features to the frequency domain also adds new features in a new space and provides more control on the loss of information. Consequently, combining the time domain and the frequency domain features in a large pool and then choosing the best subset results in higher accuracy than using either domain alone. The proposed two-layer classifier obtained a maximum classification accuracy of 97.02%.

artificial intelligence, domain feature, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.1109/TITS.2018.2817658

2006.06945

Country:

Asia (0.93)
North America > United States > Virginia > Montgomery County > Blacksburg (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Transportation (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Towards Certified Robustness of Metric Learning

Yang, Xiaochen, Guo, Yiwen, Dong, Mingzhi, Xue, Jing-Hao

arXiv.org Machine LearningJun-10-2020

Metric learning aims to learn a distance metric such that semantically similar instances are pulled together while dissimilar instances are pushed away. Many existing methods consider maximizing or at least constraining a distance "margin" that separates similar and dissimilar pairs of instances to guarantee their performance on a subsequent k-nearest neighbor classifier. However, such a margin in the feature space does not necessarily lead to robustness certification or even anticipated generalization advantage, since a small perturbation of test instance in the instance space could still potentially alter the model prediction. To address this problem, we advocate penalizing small distance between training instances and their nearest adversarial examples, and we show that the resulting new approach to metric learning enjoys a larger certified neighborhood with theoretical performance guarantee. Moreover, drawing on an intuitive geometric insight, the proposed new loss term permits an analytically elegant closed-form solution and offers great flexibility in leveraging it jointly with existing metric learning methods. Extensive experiments demonstrate the superiority of the proposed method over the state-of-the-arts in terms of both discrimination accuracy and robustness to noise.

adversarial margin, artificial intelligence, health & medicine, (16 more...)

arXiv.org Machine Learning

2006.05945

Country: Asia (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.88)

Add feedback

How to Scale Data With Outliers for Machine Learning

#artificialintelligenceMay-31-2020, 04:27:14 GMT

Many machine learning algorithms perform better when numerical input variables are scaled to a standard range. This includes algorithms that use a weighted sum of the input, like linear regression, and algorithms that use distance measures, like k-nearest neighbors. Standardizing is a popular scaling technique that subtracts the mean from values and divides by the standard deviation, transforming the probability distribution for an input variable to a standard Gaussian (zero mean and unit variance). Standardization can become skewed or biased if the input variable contains outlier values. To overcome this, the median and interquartile range can be used when standardizing numerical input variables, generally referred to as robust scaling.

artificial intelligence, input variable, machine learning, (17 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.72)

Add feedback

On-Device Training with Core ML - Make Your Pancakes Healthy Again!

#artificialintelligenceMay-21-2020, 18:48:40 GMT

Backing up the model The model stays on the device, which is great. They will lose the new version of the model unless we take care of that by sending it somewhere and later downloading it. Adding a new version of the model If the model stays and retrains on a device, what if we want to change it for a new model, let's say an improved one (not personalized)? If we do that, the user will also lose all the personalized parts of the model and will need to start from scratch. Usually we support those earlier versions too.

artificial intelligence, machine learning, new version, (7 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.34)

Add feedback

A Preliminary Study of Spatial Bias in Knn Distance Metrics

Ferrer, Gabriel J. (Hendrix College )

AAAI ConferencesMay-16-2020

A machine learning algorithm for image classification exhibits spatial bias if permuting the order of image pixels significantly alters its classification accuracy. In this paper, we explore the spatial bias of a number of different distance metrics for k-nearest-neighbor image classification. One distance metric is inspired by the convolutional kernels employed in convolutional neural networks. The other metrics are based on BRIEF descriptors, which generate bit vectors corresponding to images based on comparisons of pixel intensity values. We found that the convolutional distance metric exhibited a strong positive spatial bias, as did one of the BRIEF descriptors. Another BRIEF descriptor exhibited a negative spatial bias, and the remainder exhibited little or no spatial bias. These results lay a foundation for future work that would involve larger numbers of convolutional iterations, potentially synergized with BRIEF-style image preprocessing.

knn distance metric, preliminary study, spatial bias

AAAI Conferences

The Thirty-Third International Flairs Conference

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)

Add feedback

Case-Based Reasoning for the Analysis of Methylation Data in Oncology

Bartlett, Christopher (State University of New York at Oswego ) | Liu, Guanghui (State University of New York at Oswego) | Bichindaritz, Isabelle (State University of New York at Oswego)

AAAI ConferencesMay-16-2020

Researchers seek to identify biological markers which accurately differentiate cancer subtypes and their severity from normal controls. One such biomarker, DNA methylation, has recently become more prevalent in genetic research studies in oncology. This paper proposes to apply these findings in a study of the diagnostic accuracy of DNA methylation signatures for classifying metastasis samples. Very high classification performance measures were obtained from differentially methylated positions and regions, as well as from selected gene signatures. Perfect accuracy was achieved with the top 5 feature-selected genes using three similar cases and the K-nearest neighbor classfier. This work contributes to the path toward the identification of biological signatures for oncology samples using case-based reasoning.

case-based reasoning, methylation data, oncology

AAAI Conferences

The Thirty-Third International Flairs Conference

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.53)

Add feedback

A Weighted Mutual k-Nearest Neighbour for Classification Mining

Dhar, Joydip, Shukla, Ashaya, Kumar, Mukul, Gupta, Prashant

arXiv.org Machine LearningMay-14-2020

kNN is a very effective Instance based learning method, and it is easy to implement. Due to heterogeneous nature of data, noises from different possible sources are also widespread in nature especially in case of large-scale databases. For noise elimination and effect of pseudo neighbours, in this paper, we propose a new learning algorithm which performs the task of anomaly detection and removal of pseudo neighbours from the dataset so as to provide comparative better results. This algorithm also tries to minimize effect of those neighbours which are distant. A concept of certainty measure is also introduced for experimental results. The advantage of using concept of mutual neighbours and distance-weighted voting is that, dataset will be refined after removal of anomaly and weightage concept compels to take into account more consideration of those neighbours, which are closer. Consequently, finally the performance of proposed algorithm is calculated.

algorithm, artificial intelligence, data mining, (16 more...)

arXiv.org Machine Learning

2005.0864

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.51)

Add feedback

Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions

Karlaš, Bojan, Li, Peng, Wu, Renzhi, Gürel, Nezihe Merve, Chu, Xu, Wu, Wentao, Zhang, Ce

arXiv.org Machine LearningMay-12-2020

Machine learning (ML) applications have been thriving recently, largely attributed to the increasing availability of data. However, inconsistency and incomplete information are ubiquitous in real-world datasets, and their impact on ML applications remains elusive. In this paper, we present a formal study of this impact by extending the notion of Certain Answers for Codd tables, which has been explored by the database research community for decades, into the field of machine learning. Specifically, we focus on classification problems and propose the notion of "Certain Predictions" (CP) -- a test data example can be certainly predicted (CP'ed) if all possible classifiers trained on top of all possible worlds induced by the incompleteness of data would yield the same prediction. We study two fundamental CP queries: (Q1) checking query that determines whether a data example can be CP'ed; and (Q2) counting query that computes the number of classifiers that support a particular prediction (i.e., label). Given that general solutions to CP queries are, not surprisingly, hard without assumption over the type of classifier, we further present a case study in the context of nearest neighbor (NN) classifiers, where efficient solutions to CP queries can be developed -- we show that it is possible to answer both queries in linear or polynomial time over exponentially many possible worlds. We demonstrate one example use case of CP in the important application of "data cleaning for machine learning (DC for ML)." We show that our proposed CPClean approach built based on CP can often significantly outperform existing techniques in terms of classification accuracy with mild manual cleaning effort.

artificial intelligence, machine learning, possible world, (17 more...)

arXiv.org Machine Learning

2005.05117

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.64)

Add feedback

Mastering Machine Learning in Python

#artificialintelligenceMar-19-2020, 10:51:24 GMT

Machine learning is the process of using features to predict an outcome measure. Machine learning plays an important role in many industries. A few examples include using machine learning for medical diagnoses, predicting stock prices, and ad promotion optimization. Machine learning employs methods of statistics, data mining, engineering, and many other disciplines. In machine learning, we use a training set of data, in which we observe past outcome and feature measurements, to build a model for prediction.

banking & finance, health & medicine, neural network, (21 more...)

#artificialintelligence

Industry: Health & Medicine > Diagnostic Medicine (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.37)

Add feedback