Goto

Collaborating Authors

 Nearest Neighbor Methods


Deterministic Iteratively Built KD-Tree with KNN Search for Exact Applications

arXiv.org Artificial Intelligence

K-Nearest Neighbors (KNN) search is a fundamental algorithm in artificial intelligence software with applications in robotics, and autonomous vehicles. These wide-ranging applications utilize KNN either directly for simple classification or combine KNN results as input to other algorithms such as Locally Weighted Learning (LWL). Similar to binary trees, kd-trees become unbalanced as new data is added in online applications which can lead to rapid degradation in search performance unless the tree is rebuilt. Although approximate methods are suitable for graphics applications, which prioritize query speed over query accuracy, they are unsuitable for certain applications in autonomous systems, aeronautics, and robotic manipulation where exact solutions are desired. In this paper, we will attempt to assess the performance of non-recursive deterministic kd-tree functions and KNN functions. We will also present a "forest of interval kd-trees" which reduces the number of tree rebuilds, without compromising the exactness of query results.


Hierarchical Subspace Learning for Dimensionality Reduction to Improve Classification Accuracy in Large Data Sets

arXiv.org Machine Learning

Manifold learning is used for dimensionality reduction, with the goal of finding a projection subspace to increase and decrease the inter- and intraclass variances, respectively. However, a bottleneck for subspace learning methods often arises from the high dimensionality of datasets. In this paper, a hierarchical approach is proposed to scale subspace learning methods, with the goal of improving classification in large datasets by a range of 3% to 10%. Different combinations of methods are studied. We assess the proposed method on five publicly available large datasets, for different eigen-value based subspace learning methods such as linear discriminant analysis, principal component analysis, generalized discriminant analysis, and reconstruction independent component analysis. To further examine the effect of the proposed method on various classification methods, we fed the generated result to linear discriminant analysis, quadratic linear analysis, k-nearest neighbor, and random forest classifiers. The resulting classification accuracies are compared to show the effectiveness of the hierarchical approach, reporting results of an average of 5% increase in classification accuracy.


Faiss: A library for efficient similarity search - Facebook Engineering

#artificialintelligence

This month, we released Facebook AI Similarity Search (Faiss), a library that allows us to quickly search for multimedia documents that are similar to each other -- a challenge where traditional query search engines fall short. We've built nearest-neighbor search implementations for billion-scale data sets that are some 8.5x faster than the previous reported state-of-the-art, along with the fastest k-selection algorithm on the GPU known in the literature. This lets us break some records, including the first k-nearest-neighbor graph constructed on 1 billion high-dimensional vectors. Traditional databases are made up of structured tables containing symbolic information. For example, an image collection would be represented as a table with one row per indexed photo.


K-Nearest Neighbors (KNN) Algorithm For Machine Learning

#artificialintelligence

K-Nearest Neighbors (KNN) is a classification machine learning algorithm. This algorithm is used when the data is discrete in nature. It is a supervised machine learning algorithm. This means we need a set of reference data in order to determine the category of the future data point. This algorithm classifies the given dataset into different groups or categories.


Distributed Adaptive Nearest Neighbor Classifier: Algorithm and Theory

arXiv.org Machine Learning

When data is of an extraordinarily large size or physically stored in different locations, the distributed nearest neighbor (NN) classifier is an attractive tool for classification. We propose a novel distributed adaptive NN classifier for which the number of nearest neighbors is a tuning parameter stochastically chosen by a data-driven criterion. An early stopping rule is proposed when searching for the optimal tuning parameter, which not only speeds up the computation but also improves the finite sample performance of the proposed Algorithm. Convergence rate of excess risk of the distributed adaptive NN classifier is investigated under various sub-sample size compositions. In particular, we show that when the sub-sample sizes are sufficiently large, the proposed classifier achieves the nearly optimal convergence rate. Effectiveness of the proposed approach is demonstrated through simulation studies as well as an empirical application to a real-world dataset.


Application of Three Different Machine Learning Methods on Strategy Creation for Profitable Trades on Cryptocurrency Markets

arXiv.org Artificial Intelligence

AI and data driven solutions have been applied to different fields with outperforming and promising results. In this research work we apply k-Nearest Neighbours, eXtreme Gradient Boosting and Random Forest classifiers to direction detection problem of three cryptocurrency markets. Our input data includes price data and technical indicators. We use these classifiers to design a strategy to trade in those markets. Our test results on unseen data shows a great potential for this approach in helping investors with an expert system to exploit the market and gain profit. Our highest gain for an unseen 66 day span is 860 dollars per 1800 dollars investment. We also discuss limitations of these approaches and their potential impact to Efficient Market Hypothesis.


Why Is The K-Nearest Neighbors (KNN) Called A "Lazy Algorithm"?

#artificialintelligence

Understanding why K-Nearest Neighbors is called a "lazy learner" K-Nearest Neighbors or KNN is one of the simplest machine learning algorithms. This algorithm is very easy to implement and equally easy to understand. It is a supervised machine learning algorithm. This means we need a reference dataset to predict the category/group of the future data point. Once the KNN model is trained on the reference dataset, it classifies the new data points based on the points (neighbors) that are most similar to it.


Performance Comparison of Different Machine Learning Algorithms on the Prediction of Wind Turbine Power Generation

arXiv.org Artificial Intelligence

Over the past decade, wind energy has gained more attention in the world. However, owing to its indirectness and volatility properties, wind power penetration has increased the difficulty and complexity in dispatching and planning of electric power systems. Therefore, it is needed to make the high-precision wind power prediction in order to balance the electrical power. For this purpose, in this study, the prediction performance of linear regression, k-nearest neighbor regression and decision tree regression algorithms is compared in detail. k-nearest neighbor regression algorithm provides lower coefficient of determination values, while decision tree regression algorithm produces lower mean absolute error values. In addition, the meteorological parameters of wind speed, wind direction, barometric pressure and air temperature are evaluated in terms of their importance on the wind power parameter. The biggest importance factor is achieved by wind speed parameter. In consequence, many useful assessments are made for wind power predictions.


Stop using SMOTE to handle all your Imbalanced Data

#artificialintelligence

In classification tasks, one may encounter a situation where the target class label is not equally distributed. Such a dataset can be termed Imbalanced data. Imbalance in data can be a blocker to train a data science model. In case of imbalance class problems, the model is trained mainly on the majority class and the model becomes biased towards the majority class prediction. Hence handling of imbalance class is essential before proceeding to the modeling pipeline.


TITAN: T Cell Receptor Specificity Prediction with Bimodal Attention Networks

arXiv.org Artificial Intelligence

Motivation: The activity of the adaptive immune system is governed by T-cells and their specific T-cell receptors (TCR), which selectively recognize foreign antigens. Recent advances in experimental techniques have enabled sequencing of TCRs and their antigenic targets (epitopes), allowing to research the missing link between TCR sequence and epitope binding specificity. Scarcity of data and a large sequence space make this task challenging, and to date only models limited to a small set of epitopes have achieved good performance. Here, we establish a k-nearest-neighbor (K-NN) classifier as a strong baseline and then propose TITAN (Tcr epITope bimodal Attention Networks), a bimodal neural network that explicitly encodes both TCR sequences and epitopes to enable the independent study of generalization capabilities to unseen TCRs and/or epitopes. Results: By encoding epitopes at the atomic level with SMILES sequences, we leverage transfer learning and data augmentation to enrich the input data space and boost performance. TITAN achieves high performance in the prediction of specificity of unseen TCRs (ROC-AUC 0.87 in 10-fold CV) and surpasses the results of the current state-of-the-art (ImRex) by a large margin. Notably, our Levenshtein-distance-based K-NN classifier also exhibits competitive performance on unseen TCRs. While the generalization to unseen epitopes remains challenging, we report two major breakthroughs. First, by dissecting the attention heatmaps, we demonstrate that the sparsity of available epitope data favors an implicit treatment of epitopes as classes. This may be a general problem that limits unseen epitope performance for sufficiently complex models. Second, we show that TITAN nevertheless exhibits significantly improved performance on unseen epitopes and is capable of focusing attention on chemically meaningful molecular structures.