AITopics | Nearest Neighbor Methods

Collaborating Authors

Nearest Neighbor Methods

News Overviews Instructional Materials AI-Alerts Classics

Towards the Best Solution for Complex System Reliability: Can Statistics Outperform Machine Learning?

Gamiz, Maria Luz, Navas-Gomez, Fernando, Nozal-Cañadas, Rafael, Raya-Miranda, Rocio

arXiv.org Artificial IntelligenceOct-5-2024

Studying the reliability of complex systems using machine learning techniques involves facing a series of technical and practical challenges, ranging from the intrinsic nature of the system and data to the difficulties in modeling and effectively deploying models in real-world scenarios. This study compares the effectiveness of classical statistical techniques and machine learning methods for improving complex system analysis in reliability assessments. We aim to demonstrate that classical statistical algorithms often yield more precise and interpretable results than black-box machine learning approaches in many practical applications. The evaluation is conducted using both real-world data and simulated scenarios. We report the results obtained from statistical modeling algorithms, as well as from machine learning methods including neural networks, K-nearest neighbors, and random forests.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.04238

Country: North America > United States > California (0.28)

Genre: Research Report > Experimental Study (0.46)

Industry:

Energy (0.46)
Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.69)

Add feedback

Enhancing Carbon Emission Reduction Strategies using OCO and ICOS data

Åström, Oskar, Geldhauser, Carina, Grillitsch, Markus, Hall, Ola, Sopasakis, Alexandros

arXiv.org Artificial IntelligenceOct-5-2024

We propose a methodology to enhance local CO2 monitoring by integrating satellite data from the Orbiting Carbon Observatories (OCO-2 and OCO-3) with ground level observations from the Integrated Carbon Observation System (ICOS) and weather data from the ECMWF Reanalysis v5 (ERA5). Unlike traditional methods that downsample national data, our approach uses multimodal data fusion for high-resolution CO2 estimations. We employ weighted K-nearest neighbor (KNN) interpolation with machine learning models to predict ground level CO2 from satellite measurements, achieving a Root Mean Squared Error of 3.92 ppm. Our results show the effectiveness of integrating diverse data sources in capturing local emission patterns, highlighting the value of high-resolution atmospheric transport models. The developed model improves the granularity of CO2 monitoring, providing precise insights for targeted carbon mitigation strategies, and represents a novel application of neural networks and KNN in environmental monitoring, adaptable to various regions and temporal scales.

artificial intelligence, co 2, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2410.04288

Country:

North America > United States (0.68)
Asia (0.68)
Europe > Sweden (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Energy > Oil & Gas (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback

GABIC: Graph-based Attention Block for Image Compression

Spadaro, Gabriele, Presta, Alberto, Tartaglione, Enzo, Giraldo, Jhony H., Grangetto, Marco, Fiandrotti, Attilio

arXiv.org Artificial IntelligenceOct-3-2024

While standardized codecs like JPEG and HEVC-intra represent the industry standard in image compression, neural Learned Image Compression (LIC) codecs represent a promising alternative. In detail, integrating attention mechanisms from Vision Transformers into LIC models has shown improved compression efficiency. However, extra efficiency often comes at the cost of aggregating redundant features. This work proposes a Graph-based Attention Block for Image Compression (GABIC), a method to reduce feature redundancy based on a k-Nearest Neighbors enhanced attention mechanism. Our experiments show that GABIC outperforms comparable methods, particularly at high bit rates, enhancing compression performance.

artificial intelligence, compression, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2410.02981

Country: Europe (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback

Statistical Guarantees of Distributed Nearest Neighbor Classification

Neural Information Processing SystemsSep-25-2024, 11:20:19 GMT

Nearest neighbor is a popular nonparametric method for classification and regression with many appealing properties. In the big data era, the sheer volume and spatial/temporal disparity of big data may prohibit centrally processing and storing the data. This has imposed considerable hurdle for nearest neighbor predictions since the entire training data must be memorized. One effective way to overcome this issue is the distributed learning framework. Through majority voting, the distributed nearest neighbor classifier achieves the same rate of convergence as its oracle version in terms of the regret, up to a multiplicative constant that depends solely on the data dimension. The multiplicative difference can be eliminated by replacing majority voting with the weighted voting scheme. In addition, we provide sharp theoretical upper bounds of the number of subsamples in order for the distributed nearest neighbor classifier to reach the optimal convergence rate. It is interesting to note that the weighted voting scheme allows a larger number of subsamples than the majority voting one. Our findings are supported by numerical studies.

artificial intelligence, classifier, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback

High-Resolution Flood Probability Mapping Using Generative Machine Learning with Large-Scale Synthetic Precipitation and Inundation Data

Huang, Lipai, Antolini, Federico, Mostafavi, Ali, Blessing, Russell, Garcia, Matthew, Brody, Samuel D.

arXiv.org Artificial IntelligenceSep-20-2024

High-resolution flood probability maps are essential for addressing the limitations of existing flood risk assessment approaches but are often limited by the availability of historical event data. Also, producing simulated data needed for creating probabilistic flood maps using physics-based models involves significant computation and time effort inhibiting the feasibility. To address this gap, this study introduces Flood-Precip GAN (Flood-Precipitation Generative Adversarial Network), a novel methodology that leverages generative machine learning to simulate large-scale synthetic inundation data to produce probabilistic flood maps. With a focus on Harris County, Texas, Flood-Precip GAN begins with training a cell-wise depth estimator using a limited number of physics-based model-generated precipitation-flood events. This model, which emphasizes precipitation-based features, outperforms universal models. Subsequently, a Generative Adversarial Network (GAN) with constraints is employed to conditionally generate synthetic precipitation records. Strategic thresholds are established to filter these records, ensuring close alignment with true precipitation patterns. For each cell, synthetic events are smoothed using a K-nearest neighbors algorithm and processed through the depth estimator to derive synthetic depth distributions. By iterating this procedure and after generating 10,000 synthetic precipitation-flood events, we construct flood probability maps in various formats, considering different inundation depths. Validation through similarity and correlation metrics confirms the fidelity of the synthetic depth distributions relative to true data. Flood-Precip GAN provides a scalable solution for generating synthetic flood depth data needed to create high-resolution flood probability maps, significantly enhancing flood preparedness and mitigation efforts.

artificial intelligence, machine learning, precipitation, (16 more...)

arXiv.org Artificial Intelligence

2409.13936

Country:

North America > United States > Texas > Harris County (0.34)
North America > United States > Texas > Brazos County > College Station (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.88)

Add feedback

Intuitive Human-Robot Interface: A 3-Dimensional Action Recognition and UAV Collaboration Framework

Chaudhary, Akash, Nascimento, Tiago, Saska, Martin

arXiv.org Artificial IntelligenceAug-17-2024

Harnessing human movements to command an Unmanned Aerial Vehicle (UAV) holds the potential to revolutionize their deployment, rendering it more intuitive and user-centric. In this research, we introduce a novel methodology adept at classifying three-dimensional human actions, leveraging them to coordinate on-field with a UAV. Utilizing a stereo camera, we derive both RGB and depth data, subsequently extracting three-dimensional human poses from the continuous video feed. This data is then processed through our proposed k-nearest neighbour classifier, the results of which dictate the behaviour of the UAV. It also includes mechanisms ensuring the robot perpetually maintains the human within its visual purview, adeptly tracking user movements. We subjected our approach to rigorous testing involving multiple tests with real robots. The ensuing results, coupled with comprehensive analysis, underscore the efficacy and inherent advantages of our proposed methodology.

artificial intelligence, landmark, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2408.09232

Country:

Europe (0.28)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Robotics & Automation (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.49)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

dzNLP at NADI 2024 Shared Task: Multi-Classifier Ensemble with Weighted Voting and TF-IDF Features

Lichouri, Mohamed, Lounnas, Khaled, Zahaf, Boualem Nadjib, Rabiai, Mehdi Ayoub

arXiv.org Artificial IntelligenceJul-18-2024

This paper presents the contribution of our dzNLP team to the NADI 2024 shared task, specifically in Subtask 1 - Multi-label Country-level Dialect Identification (MLDID) (Closed Track). We explored various configurations to address the challenge: in Experiment 1, we utilized a union of n-gram analyzers (word, character, character with word boundaries) with different n-gram values; in Experiment 2, we combined a weighted union of Term Frequency-Inverse Document Frequency (TF-IDF) features with various weights; and in Experiment 3, we implemented a weighted major voting scheme using three classifiers: Linear Support Vector Classifier (LSVC), Random Forest (RF), and K-Nearest Neighbors (KNN). Our approach, despite its simplicity and reliance on traditional machine learning techniques, demonstrated competitive performance in terms of F1-score and precision. Notably, we achieved the highest precision score of 63.22% among the participating teams. However, our overall F1 score was approximately 21%, significantly impacted by a low recall rate of 12.87%. This indicates that while our models were highly precise, they struggled to recall a broad range of dialect labels, highlighting a critical area for improvement in handling diverse dialectal variations.

information retrieval, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2407.13608

Country:

Africa > Middle East > Algeria (0.17)
Europe > Ukraine (0.16)
Europe > Spain (0.14)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback

Evaluating the performance-deviation of itemKNN in RecBole and LensKit

Schmidt, Michael, Nitschke, Jannik, Prinz, Tim

arXiv.org Artificial IntelligenceJul-18-2024

This study examines the performance of item-based k-Nearest Neighbors (ItemKNN) algorithms in the RecBole and LensKit recommender system libraries. Using four data sets (Anime, Modcloth, ML-100K, and ML-1M), we assess each library's efficiency, accuracy, and scalability, focusing primarily on normalized discounted cumulative gain (nDCG). Our results show that RecBole outperforms LensKit on two of three metrics on the ML-100K data set: it achieved an 18% higher nDCG, 14% higher precision, and 35% lower recall. To ensure a fair comparison, we adjusted LensKit's nDCG calculation to match RecBole's method. This alignment made the performance more comparable, with LensKit achieving an nDCG of 0.2540 and RecBole 0.2674. Differences in similarity matrix calculations were identified as the main cause of performance deviations. After modifying LensKit to retain only the top K similar items, both libraries showed nearly identical nDCG values across all data sets. For instance, both achieved an nDCG of 0.2586 on the ML-1M data set with the same random seed. Initially, LensKit's original implementation only surpassed RecBole in the ModCloth dataset.

artificial intelligence, implementation, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2407.13531

Country: North America > United States > Illinois (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.69)

Add feedback

On high-dimensional modifications of the nearest neighbor classifier

Ghosh, Annesha, Banerjee, Bilol, Ghosh, Anil K.

arXiv.org Machine LearningJul-6-2024

In supervised classification, we use a training set of labeled observations from different competing classes to form a decision rule for classifying unlabeled test set observations as accurately as possible. Starting from Fisher (1936), Rao (1948) and Fix and Hodges (1951), several parametric as well as nonparametric classifiers have been developed for this purpose (see, e.g., Duda et al., 2007; Hastie et al., 2009). Among them, the nearest neighbor classifier (see, e.g., Cover and Hart, 1967) is perhaps the most popular one. The k-nearest neighbor classifier (k-NN) classifies an observation x to the class having the maximum number of representatives among the k nearest neighbors of x. This classifier works well if the training sample size is large compared to the dimension of the data. For a suitable choice of k (which increases with the training sample size at an appropriate rate), under some mild regularity conditions, the misclassification rate of the k-NN classifier converges to the Bayes risk (i.e., the misclassification rate of the Bayes classifier) as the training sample size grows to infinity (see, e.g.

artificial intelligence, classifier, machine learning, (16 more...)

arXiv.org Machine Learning

2407.05145

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Neurocache: Efficient Vector Retrieval for Long-range Language Modeling

Safaya, Ali, Yuret, Deniz

arXiv.org Artificial IntelligenceJul-2-2024

This paper introduces Neurocache, an approach to extend the effective context size of large language models (LLMs) using an external vector cache to store its past states. Like recent vector retrieval approaches, Neurocache uses an efficient k-nearest-neighbor (kNN) algorithm to retrieve relevant past states and incorporate them into the attention process. Neurocache improves upon previous methods by (1) storing compressed states, which reduces cache size; (2) performing a single retrieval operation per token which increases inference speed; and (3) extending the retrieval window to neighboring states, which improves both language modeling and downstream task accuracy. Our experiments show the effectiveness of Neurocache both for models trained from scratch and for pre-trained models such as Llama2-Figure 1: Performance and Scalability of Neurocache 7B and Mistral-7B when enhanced with the vs. Memorizing Transformers (Wu et al., 2022) on cache mechanism. We also compare Neurocache PG-19: The graph illustrates Neurocache's consistently with text retrieval methods and show lower token perplexity and faster inference times across improvements in single-document questionanswering various cache sizes on the Project Gutenberg-19 dataset, and few-shot learning tasks.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2407.02486

Country: Europe > Italy (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback