Goto

Collaborating Authors

 Nearest Neighbor Methods


HCOA*: Hierarchical Class-ordered A* for Navigation in Semantic Environments

arXiv.org Artificial Intelligence

--This paper addresses the problem of robot navigation in mixed geometric/semantic 3D environments. Given a hierarchical representation of the environment, the objective is to navigate from a start position to a goal, while satisfying task-specific safety constraints and minimizing computational cost. We introduce Hierarchical Class-ordered A* (HCOA*), an algorithm that leverages the environment's hierarchy for efficient and safe path-planning in mixed geometric/semantic graphs. We use a total order over the semantic classes and prove theoretical performance guarantees for the algorithm. We propose three approaches for higher-layer node classification based on the semantics of the lowest layer: a Graph Neural Network method, a k-Nearest Neighbors method, and a Majority-Class method. We evaluate HCOA* in simulations on two 3D Scene Graphs, comparing it to the state-of-the-art and assessing the performance of each classification approach. Results show that HCOA* reduces the computational time of navigation by up to 50%, while maintaining near-optimal performance across a wide range of scenarios. S robotic sensing technologies advance, enabling robots to perceive vast and diverse information, two fundamental questions arise: What information from this extensive data stream is most important for a given task? Hierarchical semantic environment representations, such as 3D Scene Graphs (3DSGs) [1]-[3], provide rich and structured abstractions that mirror human-like reasoning, thus facilitating the selection and organization of information.





Classifying Cool Dwarfs: Comprehensive Spectral Typing of Field and Peculiar Dwarfs Using Machine Learning

arXiv.org Artificial Intelligence

Low-mass stars and brown dwarfs -- spectral types (SpTs) M0 and later -- play a significant role in studying stellar and substellar processes and demographics, reaching down to planetary-mass objects. Currently, the classification of these sources remains heavily reliant on visual inspection of spectral features, equivalent width measurements, or narrow-/wide-band spectral indices. Recent advances in machine learning (ML) methods offer automated approaches for spectral typing, which are becoming increasingly important as large spectroscopic surveys such as Gaia, SDSS, and SPHEREx generate datasets containing millions of spectra. We investigate the application of ML in spectral type classification on low-resolution (R $\sim$ 120) near-infrared spectra of M0--T9 dwarfs obtained with the SpeX instrument on the NASA Infrared Telescope Facility. We specifically aim to classify the gravity- and metallicity-dependent subclasses for late-type dwarfs. We used binned fluxes as input features and compared the efficacy of spectral type estimators built using Random Forest (RF), Support Vector Machine (SVM), and K-Nearest Neighbor (KNN) models. We tested the influence of different normalizations and analyzed the relative importance of different spectral regions for surface gravity and metallicity subclass classification. Our best-performing model (using KNN) classifies 95.5 $\pm$ 0.6% of sources to within $\pm$1 SpT, and assigns surface gravity and metallicity subclasses with 89.5 $\pm$ 0.9% accuracy. We test the dependence of signal-to-noise ratio on classification accuracy and find sources with SNR $\gtrsim$ 60 have $\gtrsim$ 95% accuracy. We also find that zy-band plays the most prominent role in the RF model, with FeH and TiO having the highest feature importance.


Evaluating Imputation Techniques for Short-Term Gaps in Heart Rate Data

arXiv.org Artificial Intelligence

Recent advances in wearable technology have enabled the continuous monitoring of vital physiological signals, essential for predictive modeling and early detection of extreme physiological events. Among these physiological signals, heart rate (HR) plays a central role, as it is widely used in monitoring and managing cardiovascular conditions and detecting extreme physiological events such as hypoglycemia. However, data from wearable devices often suffer from missing values. To address this issue, recent studies have employed various imputation techniques. Traditionally, the effectiveness of these methods has been evaluated using predictive accuracy metrics such as RMSE, MAPE, and MAE, which assess numerical proximity to the original data. While informative, these metrics fail to capture the complex statistical structure inherent in physiological signals. This study bridges this gap by presenting a comprehensive evaluation of four statistical imputation methods, linear interpolation, K Nearest Neighbors (KNN), Piecewise Cubic Hermite Interpolating Polynomial (PCHIP), and B splines, for short term HR data gaps. We assess their performance using both predictive accuracy metrics and statistical distance measures, including the Cohen Distance Test (CDT) and Jensen Shannon Distance (JS Distance), applied to HR data from the D1NAMO dataset and the BIG IDEAs Lab Glycemic Variability and Wearable Device dataset. The analysis reveals limitations in existing imputation approaches and the absence of a robust framework for evaluating imputation quality in physiological signals. Finally, this study proposes a foundational framework to develop a composite evaluation metric to assess imputation performance.


Embedding Dimension of Contrastive Learning and k -Nearest Neighbors

Neural Information Processing Systems

We study the embedding dimension of distance comparison data in two settings: contrastive learning and k -nearest neighbors ( k -NN). In both cases, the goal is to find the smallest dimension d of an \ell_p -space in which a given dataset can be represented. We show that the arboricity of the associated graphs plays a key role in designing embeddings. Using this approach, for the most frequently used \ell_2 -distance, we get matching upper and lower bounds in both settings. In contrastive learning, we are given m labeled samples of the form (x_i, y_i, z_i -) representing the fact that the positive example y_i is closer to the anchor x_i than the negative example z_i .


SocRipple: A Two-Stage Framework for Cold-Start Video Recommendations

arXiv.org Artificial Intelligence

Most industry scale recommender systems face critical cold start challenges new items lack interaction history, making it difficult to distribute them in a personalized manner. Standard collaborative filtering models underperform due to sparse engagement signals, while content only approaches lack user specific relevance. We propose SocRipple, a novel two stage retrieval framework tailored for coldstart item distribution in social graph based platforms. Stage 1 leverages the creators social connections for targeted initial exposure. Stage 2 builds on early engagement signals and stable user embeddings learned from historical interactions to "ripple" outwards via K Nearest Neighbor (KNN) search. Large scale experiments on a major video platform show that SocRipple boosts cold start item distribution by +36% while maintaining user engagement rate on cold start items, effectively balancing new item exposure with personalized recommendations.


Fast and Accurate Explanations of Distance-Based Classifiers by Uncovering Latent Explanatory Structures

arXiv.org Machine Learning

Distance-based classifiers, such as k-nearest neighbors and support vector machines, continue to be a workhorse of machine learning, widely used in science and industry. In practice, to derive insights from these models, it is also important to ensure that their predictions are explainable. While the field of Explainable AI has supplied methods that are in principle applicable to any model, it has also emphasized the usefulness of latent structures (e.g. the sequence of layers in a neural network) to produce explanations. In this paper, we contribute by uncovering a hidden neural network structure in distance-based classifiers (consisting of linear detection units combined with nonlinear pooling layers) upon which Explainable AI techniques such as layer-wise relevance propagation (LRP) become applicable. Through quantitative evaluations, we demonstrate the advantage of our novel explanation approach over several baselines. We also show the overall usefulness of explaining distance-based models through two practical use cases.


Robust Classification under Noisy Labels: A Geometry-Aware Reliability Framework for Foundation Models

arXiv.org Artificial Intelligence

Foundation models (FMs) pretrained on large datasets have become fundamental for various downstream machine learning tasks, in particular in scenarios where obtaining perfectly labeled data is prohibitively expensive. In this paper, we assume an FM has to be fine-tuned with noisy data and present a two-stage framework to ensure robust classification in the presence of label noise without model retraining. Recent work has shown that simple k-nearest neighbor (kNN) approaches using an embedding derived from an FM can achieve good performance even in the presence of severe label noise. Our work is motivated by the fact that these methods make use of local geometry. In this paper, following a similar two-stage procedure, reliability estimation followed by reliability-weighted inference, we show that improved performance can be achieved by introducing geometry information. For a given instance, our proposed inference uses a local neighborhood of training data, obtained using the non-negative kernel (NNK) neighborhood construction. We propose several methods for reliability estimation that can rely less on distance and local neighborhood as the label noise increases. Our evaluation on CIFAR-10 and DermaMNIST shows that our methods improve robustness across various noise conditions, surpassing standard K-NN approaches and recent adaptive-neighborhood baselines.