AITopics | Nearest Neighbor Methods

Collaborating Authors

Nearest Neighbor Methods

News Overviews Instructional Materials AI-Alerts Classics

Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search

Jääsaari, Elias, Hyvönen, Ville, Roos, Teemu

arXiv.org Machine LearningDec-18-2018

Approximate nearest neighbor algorithms are used to speed up nearest neighbor search in a wide array of applications. However, current indexing methods feature several hyperparameters that need to be tuned to reach an acceptable accuracy--speed trade-off. A grid search in the parameter space is often impractically slow due to a time-consuming index-building procedure. Therefore, we propose an algorithm for automatically tuning the hyperparameters of indexing methods based on randomized space-partitioning trees. In particular, we present results using randomized k-d trees, random projection trees and randomized PCA trees. The tuning algorithm adds minimal overhead to the index-building process but is able to find the optimal hyperparameters accurately. We demonstrate that the algorithm is significantly faster than existing approaches, and that the indexing methods used are competitive with the state-of-the-art methods in query time while being faster to build.

information retrieval, machine learning, natural language, (15 more...)

arXiv.org Machine Learning

doi: 10.1007/978-3-030-16145-3_46

1812.07484

Country: Europe > Finland (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.48)

Add feedback

Rademacher Complexity and Generalization Performance of Multi-category Margin Classifiers

Musayeva, Khadija, Lauer, Fabien, Guermeur, Yann

arXiv.org Machine LearningDec-3-2018

Although the theory of binary pattern classification is well established [1, 2], the theory of multicategory classificationis far from being complete. The research in this case addresses problems such as the sample-complexity analysis of empirical risk minimization algorithms[3], or consistency analysis of multi-class loss functions and of specific families of classifiers [4]. Another open question is the optimal dependency of guaranteed risks of multi-category classifiers on the number C of categories and the sample size m. It is all the more the case for the problems that involve a large number of classes. When the considered classifiers are margin ones that take decision based on a score per category, the dependency on the margin parameter γ also becomes relevant to the characterization of their generalization performance. If this question has been mainly studied for specific families of classifiers, be it k-nearest neighbors [5], kernel methods [6, 7] and decision trees [8], tackling it under minimal learnability assumptions remains a challenging task. This paper focuses on obtaining guaranteed risks under such assumptions.

artificial intelligence, dependency, machine learning, (18 more...)

arXiv.org Machine Learning

1812.00584

Country: North America (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback

Prototype-based Neural Network Layers: Incorporating Vector Quantization

Saralajew, Sascha, Holdijk, Lars, Rees, Maike, Villmann, Thomas

arXiv.org Artificial IntelligenceDec-3-2018

Neural networks currently dominate the machine learning community and they do so for good reasons. Their accuracy on complex tasks such as image classification is unrivaled at the moment and with recent improvements they are reasonably easy to train. Nevertheless, neural networks are lacking robustness and interpretability. Prototype-based vector quantization methods on the other hand are known for being robust and interpretable. For this reason, we propose techniques and strategies to merge both approaches. This contribution will particularly highlight the similarities between them and outline how to construct a prototype-based classification layer for multilayer networks. Additionally, we provide an alternative, prototype-based, approach to the classical convolution operation. Numerical results are not part of this report, instead the focus lays on establishing a strong theoretical framework. By publishing our framework and the respective theoretical considerations and justifications before finalizing our numerical experiments we hope to jump-start the incorporation of prototype-based learning in neural networks and vice versa.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1812.01214

Country:

North America > United States (1.00)
Europe (1.00)

Genre:

Overview (0.93)
Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.46)

Add feedback

Early Prediction of Course Grades: Models and Feature Selection

Li, Hengxuan, Lynch, Collin F., Barnes, Tiffany

arXiv.org Artificial IntelligenceDec-3-2018

In this paper, we compare predictive models for students' final performance in a blended course using a set of generic features collected from the first six weeks of class. These features were extracted from students' online homework submission logs as well as other online actions. We compare the effectiveness of 5 different ML algorithms (SVMs, Support Vector Regression, Decision Tree, Naive Bayes and K-Nearest Neighbor). We found that SVMs outperform other models and improve when compared to the baseline. This study demonstrates feasible implementations for predictive models that rely on common data from blended courses that can be used to monitor students' progress and to tailor instruction.

artificial intelligence, machine learning, student, (16 more...)

arXiv.org Artificial Intelligence

1812.00843

Country: North America > United States (0.15)

Genre:

Instructional Material (1.00)
Research Report (0.83)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.56)

Add feedback

Statistical Optimality of Interpolated Nearest Neighbor Algorithms

Xing, Yue, Song, Qifan, Cheng, Guang

arXiv.org Machine LearningNov-30-2018

In the era of deep learning, understanding over-fitting phenomenon becomes increasingly important. It is observed that carefully designed deep neural networks achieve small testing error even when the training error is close to zero. One possible explanation is that for many modern machine learning algorithms, over-fitting can greatly reduce the estimation bias, while not increasing the estimation variance too much. To illustrate the above idea, we prove that the proposed interpolated nearest neighbor algorithm achieves the minimax optimal rate in both regression and classification regimes, and observe that they are empirically better than the traditional $k$ nearest neighbor method in some cases.

artificial intelligence, assumption, machine learning, (15 more...)

arXiv.org Machine Learning

1810.02814

Country: North America > United States > Indiana > Tippecanoe County (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback

heart disease prediction – Good Audience

#artificialintelligenceNov-27-2018, 10:18:45 GMT

The project is about predicting coronary heart disease by using three different ML algorithms. And to know which is the best approach. There are roughly two controls per case of CHD. Many of the CHD positive men have undergone blood pressure reduction treatment and other programs to reduce their risk factors after their occurrence of CHD. In some cases the measurements were made after these treatments.

artificial intelligence, good audience, machine learning, (9 more...)

#artificialintelligence

Country: Africa (0.22)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.43)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.35)

Add feedback

Machine Learning Distinguishes Neurosurgical Skill Levels in a Virtual Reality Tumor Resection Task

Siyar, Samaneh, Azarnoush, Hamed, Rashidi, Saeid, Winkler-Schwartz, Alexandre, Bissonnette, Vincent, Ponnudurai, Nirros, Del Maestro, Rolando F.

arXiv.org Machine LearningNov-20-2018

Disclosure of financial support: This work was supported by the Di Giovanni Foundation, the Montreal English School Board, the B-Strong Foundation, the Colannini Foundation, the Montreal Neurological Institute and Hospital and the McGill Department of Orthopedics. Samaneh Siyar is a Visiting Scholar in the Neurosurgical Simulation Research and Training Centre. Dr. H. Azarnoush previously held the Postdoctoral Neuro-Oncology Fellowship from the Montreal Neurological Institute and Hospital and is a Visiting Professor in the Neurosurgical Simulation Research and Training Centre. Dr. Winkler-Schwartz holds a Robert Maudsley Fellowship from the Royal College of Physicians and Surgeons of Canada and Nirros Ponnudurai is supported by a Heffez Family Bursary. Dr. Del Maestro is the William Feindel Emeritus Professor in Neuro-Oncology at McGill University. 2 Acknowledgments We thank all the medical students, residents, and neurosurgeons from the Montreal Neurological Institute and Hospital and at other institutions who participated in this study. We would also like to thank Robert DiRaddo, Group Leader, Simulation, Life Sciences Division, National Research Council Canada at Boucherville and his team, including Denis Laroche, Valérie Pazos, Nusrat Choudhury and Linda Pecora for their support in the development of the scenarios used in these studies and all the members of the Simulation, Life Sciences Division, National Research Council Canada.

artificial intelligence, classifier, machine learning, (13 more...)

arXiv.org Machine Learning

1811.08159

Country: North America > Canada > Quebec > Montreal (1.00)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.55)

Add feedback

Efficient and Scalable Multi-task Regression on Massive Number of Tasks

He, Xiao, Alesiani, Francesco, Shaker, Ammar

arXiv.org Machine LearningNov-14-2018

Many real-world large-scale regression problems can be formulated as Multi-task Learning (MTL) problems with a massive number of tasks, as in retail and transportation domains. However, existing MTL methods still fail to offer both the generalization performance and the scalability for such problems. Scaling up MTL methods to problems with a tremendous number of tasks is a big challenge. Here, we propose a novel algorithm, named Convex Clustering Multi-Task regression Learning (CCMTL), which integrates with convex clustering on the k-nearest neighbor graph of the prediction models. Further, CCMTL efficiently solves the underlying convex problem with a newly proposed optimization method. CCMTL is accurate, efficient to train, and empirically scales linearly in the number of tasks. On both synthetic and real-world datasets, the proposed CCMTL outperforms seven state-of-the-art (SoA) multi-task learning methods in terms of prediction accuracy as well as computational efficiency. On a real-world retail dataset with 23, 812 tasks, CCMTL requires only around 30 seconds to train on a single thread, while the SoA methods need up to hours or even days.

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Machine Learning

1811.05695

Country: Europe > Germany (0.14)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.56)

Add feedback

Dynamic Feature Scaling for K-Nearest Neighbor Algorithm

Bhardwaj, Chandrasekaran Anirudh, Mishra, Megha, Desikan, Kalyani

arXiv.org Machine LearningNov-12-2018

Nearest Neighbors Algorithm is a Lazy Learning Algorithm, in which the algorithm tries to approximate the predictions with the help of similar existing vectors in the training dataset. The predictions made by the K-Nearest Neighbors algorithm is based on averaging the target values of the spatial neighbors. The selection process for neighbors in the Hermitian space is done with the help of distance metrics such as Euclidean distance, Minkowski distance, Mahalanobis distance etc. A majority of the metrics such as Euclidean distance are scale variant, meaning that the results could vary for different range of values used for the features. Standard techniques used for the normalization of scaling factors are feature scaling method such as Z-score normalization technique, Min-Max scaling etc. Scaling methods uniformly assign equal weights to all the features, which might result in a non-ideal situation. This paper proposes a novel method to assign weights to individual feature with the help of out of bag errors obtained from constructing multiple decision tree models.

algorithm, artificial intelligence, machine learning, (13 more...)

arXiv.org Machine Learning

1811.05062

Country: North America > United States > California > Orange County > Irvine (0.16)

Genre: Research Report (0.70)

Industry: Health & Medicine > Therapeutic Area (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback

Machine Learning Hello World with Scikit Learn : Chapter 2 KNN K Nearest Neighbor Algorithm

#artificialintelligenceNov-11-2018, 03:46:06 GMT

Hello World of Machine Learning is a video series to acquaint, enable and empower you to understand What, How and When of Machine Learning This is chapter - 2 of the series and in this chapter, we'll be exploring our first Machine Learning Algorithm called KNN (K Nearest Neighbor). This chapter will explain 1. What is KNN and how it works? Hope it helps you in learning something new.. enjoy! Please take a moment to Like! Subscribe!

artificial intelligence, machine learning, social media, (4 more...)

#artificialintelligence

Industry: Information Technology > Services (0.34)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback