Goto

Collaborating Authors

 Nearest Neighbor Methods


Nearest-neighbor missing visuals revealed

#artificialintelligence

The unsupervised K- Nearest Neighbour (KNN) algorithm is perhaps the most straightforward machine learning algorithm. However, a simple algorithm does not mean that analyzing the results is equally simple. As per my research, there are not many documented approaches to analyzing the results of the KNN algorithm. In this article, I will show you how to analyze and understand the results of the unsupervised KNN algorithm. I will be using a dataset on cars.


Human Emotion Classification based on EEG Signals Using Recurrent Neural Network And KNN

arXiv.org Artificial Intelligence

In human contact, emotion is very crucial. Attributes like words, voice intonation, facial expressions, and kinesics can all be used to portray one's feelings. However, brain-computer interface (BCI) devices have not yet reached the level required for emotion interpretation. With the rapid development of machine learning algorithms, dry electrode techniques, and different real-world applications of the brain-computer interface for normal individuals, emotion categorization from EEG data has recently gotten a lot of attention. Electroencephalogram (EEG) signals are a critical resource for these systems. The primary benefit of employing EEG signals is that they reflect true emotion and are easily resolved by computer systems. In this work, EEG signals associated with good, neutral, and negative emotions were identified using channel selection preprocessing. However, researchers had a limited grasp of the specifics of the link between various emotional states until now. To identify EEG signals, we used discrete wavelet transform and machine learning techniques such as recurrent neural network (RNN) and k-nearest neighbor (kNN) algorithm. Initially, the classifier methods were utilized for channel selection. As a result, final feature vectors were created by integrating the features of EEG segments from these channels. Using the RNN and kNN algorithms, the final feature vectors with connected positive, neutral, and negative emotions were categorized independently. The classification performance of both techniques is computed and compared. Using RNN and kNN, the average overall accuracies were 94.844 % and 93.438 %, respectively.


BABD: A Bitcoin Address Behavior Dataset for Pattern Analysis

arXiv.org Artificial Intelligence

Cryptocurrencies are no longer just the preferred option for cybercriminal activities on darknets, due to the increasing adoption in mainstream applications. This is partly due to the transparency associated with the underpinning ledgers, where any individual can access the record of a transaction record on the public ledger. In this paper, we build a dataset comprising Bitcoin transactions between 12 July 2019 and 26 May 2021. This dataset (hereafter referred to as BABD-13) contains 13 types of Bitcoin addresses, 5 categories of indicators with 148 features, and 544,462 labeled data, which is the largest labeled Bitcoin address behavior dataset publicly available to our knowledge. We then use our proposed dataset on common machine learning models, namely: k-nearest neighbors algorithm, decision tree, random forest, multilayer perceptron, and XGBoost. The results show that the accuracy rates of these machine learning models for the multi-classification task on our proposed dataset are between 93.24% and 97.13%. We also analyze the proposed features and their relationships from the experiments, and propose a k-hop subgraph generation algorithm to extract a k-hop subgraph from the entire Bitcoin transaction graph constructed by the directed heterogeneous multigraph starting from a specific Bitcoin address node (e.g., a known transaction associated with a criminal investigation). Besides, we initially analyze the behavior patterns of different types of Bitcoin addresses according to the extracted features.


Does predict function work in parallel when predicting k-nearest neighbour?

#artificialintelligence

I have a k-nearest neighbour classifier which I have trained with fitcknn. I am wondering, when predicting labels on the model using predicit does it work in parallel? I have tested using predict in a for loop and parfor loop. The simple for loop performs a bit faster which makes me think there is some optimisation and built in parallelisation that the predict function is taking advantage of. However, the documentation makes no reference to this, and I thought MATLAB always runs in a single thread unless specifically using a parallel pool?


Machine Learning-Based GPS Multipath Detection Method Using Dual Antennas

arXiv.org Artificial Intelligence

In urban areas, global navigation satellite system (GNSS) signals are often reflected or blocked by buildings, thus resulting in large positioning errors. In this study, we proposed a machine learning approach for global positioning system (GPS) multipath detection that uses dual antennas. A machine learning model that could classify GPS signal reception conditions was trained with several GPS measurements selected as suggested features. We applied five features for machine learning, including a feature obtained from the dual antennas, and evaluated the classification performance of the model, after applying four machine learning algorithms: gradient boosting decision tree (GBDT), random forest, decision tree, and K-nearest neighbor (KNN). It was found that a classification accuracy of 82%-96% was achieved when the test data set was collected at the same locations as those of the training data set. However, when the test data set was collected at locations different from those of the training data, a classification accuracy of 44%-77% was obtained.


Enhanced Nearest Neighbor Classification for Crowdsourcing

arXiv.org Machine Learning

In machine learning, crowdsourcing is an economical way to label a large amount of data. However, the noise in the produced labels may deteriorate the accuracy of any classification method applied to the labelled data. We propose an enhanced nearest neighbor classifier (ENN) to overcome this issue. Two algorithms are developed to estimate the worker quality (which is often unknown in practice): one is to construct the estimate based on the denoised worker labels by applying the $k$NN classifier to the expert data; the other is an iterative algorithm that works even without access to the expert data. Other than strong numerical evidence, our proposed methods are proven to achieve the same regret as its oracle version based on high-quality expert data. As a technical by-product, a lower bound on the sample size assigned to each worker to reach the optimal convergence rate of regret is derived.


Benefit of Interpolation in Nearest Neighbor Algorithms

arXiv.org Machine Learning

In some studies \citep[e.g.,][]{zhang2016understanding} of deep learning, it is observed that over-parametrized deep neural networks achieve a small testing error even when the training error is almost zero. Despite numerous works towards understanding this so-called "double descent" phenomenon \citep[e.g.,][]{belkin2018reconciling,belkin2019two}, in this paper, we turn into another way to enforce zero training error (without over-parametrization) through a data interpolation mechanism. Specifically, we consider a class of interpolated weighting schemes in the nearest neighbors (NN) algorithms. By carefully characterizing the multiplicative constant in the statistical risk, we reveal a U-shaped performance curve for the level of data interpolation in both classification and regression setups. This sharpens the existing result \citep{belkin2018does} that zero training error does not necessarily jeopardize predictive performances and claims a counter-intuitive result that a mild degree of data interpolation actually {\em strictly} improve the prediction performance and statistical stability over those of the (un-interpolated) $k$-NN algorithm. In the end, the universality of our results, such as change of distance measure and corrupted testing data, will also be discussed.


What is K-Nearest Neighbor(KNN) ?

#artificialintelligence

K-Nearest Neighbor(KNN) algorithm is a poplar model and falls under the Supervised Learning and it can be used to solve both classification and regression problems. In this article, I would be giving you a detailed explanation and how this model works. K-Nearest Neighbor is one of the simplest Machine Learning algorithms based on Supervised Learning technique. KNN algorithm assumes the similarity between the new data and available data and put the new case into the category that is most similar to the available categories. The value of the K is very important.


Khedher

AAAI Conferences

In this paper, we aim to predict students' learning perfor-mance by combining two-modality sensing variables, namely eye tracking that monitors learners' eye movements and elec-troencephalography (EEG) that measures learners' cerebral activity. Our long-term goal is to use both data to provide ap-propriate adaptive assistance for students to enhance their learning experience and optimize their performance. An ex-perimental study was conducted in order to collet gaze data and brainwave signals of fifteen students during an interac-tion with a virtual learning environment. Different classifica-tion algorithms were used to discriminate between two groups of learners: students who successfully resolve the problem-solving tasks and students who do not. Experimental results demonstrated that the K-Nearest Neighbor classifier achieved good accuracy when combining both eye movement and EEG features compared to using solely eye movement or EEG.


MVP-Net: Multiple View Pointwise Semantic Segmentation of Large-Scale Point Clouds

arXiv.org Artificial Intelligence

Semantic segmentation of 3D point cloud is an essential task for autonomous driving environment perception. The pipeline of most pointwise point cloud semantic segmentation methods includes points sampling, neighbor searching, feature aggregation, and classification. Neighbor searching method like K-nearest neighbors algorithm, KNN, has been widely applied. However, the complexity of KNN is always a bottleneck of efficiency. In this paper, we propose an end-to-end neural architecture, Multiple View Pointwise Net, MVP-Net, to efficiently and directly infer large-scale outdoor point cloud without KNN or any complex pre/postprocessing. Instead, assumption-based sorting and multi-rotation of point cloud methods are introduced to point feature aggregation and receptive field expanding. Numerical experiments show that the proposed MVP-Net is 11 times faster than the most efficient pointwise semantic segmentation method RandLA-Net and achieves the same accuracy on the large-scale benchmark SemanticKITTI dataset.