Goto

Collaborating Authors

 hubness



Cache Mechanism for Agent RAG Systems

arXiv.org Artificial Intelligence

Recent advances in Large Language Model (LLM)-based agents have been propelled by Retrieval-Augmented Generation (RAG), which grants the models access to vast external knowledge bases. Despite RAG's success in improving agent performance, agent-level cache management, particularly constructing, maintaining, and updating a compact, relevant corpus dynamically tailored to each agent's need, remains underexplored. Therefore, we introduce ARC (Agent RAG Cache Mechanism), a novel, annotation-free caching framework that dynamically manages small, high-value corpora for each agent. By synthesizing historical query distribution patterns with the intrinsic geometry of cached items in the embedding space, ARC automatically maintains a high-relevance cache. With comprehensive experiments on three retrieval datasets, our experimental results demonstrate that ARC reduces storage requirements to 0.015% of the original corpus while offering up to 79.8% has-answer rate and reducing average retrieval latency by 80%. Our results demonstrate that ARC can drastically enhance efficiency and effectiveness in RAG-powered LLM agents.


Supplement to Learning Deep Attribution Priors Based On Prior Knowledge 1 Model Implementations and Hyperparameter Tuning LASSO: In our experiments we used the scikit-learn [ 10

Neural Information Processing Systems

All linear models were implemented using PyTorch. We used an Nvidia GTX 1080 Ti GPU for training. IG computes feature attributions by comparing a model's prediction with the prediction We also found that EG led to the best performance for models trained using the DAPr framework. RNA-seq data as follows 1. N is the total number of counts. We also scaled Dasatinib IC50 values to have zero mean and unit variance.


Prediction hubs are context-informed frequent tokens in LLMs

arXiv.org Artificial Intelligence

Hubness, the tendency for few points to be among the nearest neighbours of a disproportionate number of other points, commonly arises when applying standard distance measures to high-dimensional data, often negatively impacting distance-based analysis. As autoregressive large language models (LLMs) operate on high-dimensional representations, we ask whether they are also affected by hubness. We first show, theoretically, that the only representation comparison operation performed by LLMs, namely that between context and unembedding vectors to determine continuation probabilities, is not characterized by the concentration of distances phenomenon that typically causes the appeareance of nuisance hubness. We then empirically show that this comparison still leads to a high degree of hubness, but the hubs in this case do not constitute a disturbance. They are rather the result of context-modulated frequent tokens often appearing in the pool of likely candidates for next token prediction. On the other hand, when other distance computations involving LLM representations are performed, we do not have the same theoretical guarantees, and, indeed, we see nuisance hubs appear. In summary, our work highlights, on the one hand, how hubness, while omnipresent in high-dimensional spaces, is not always a negative property that needs to be mitigated, and, on the other hand, it shows that various widely-used LLMs have developed a guessing strategy that consists in constantly assigning a high probability to frequent tokens.


Hubness Reduction Improves Sentence-BERT Semantic Spaces

arXiv.org Artificial Intelligence

Semantic representations of text, i.e. representations of natural language which capture meaning by geometry, are essential for areas such as information retrieval and document grouping. High-dimensional trained dense vectors have received much attention in recent years as such representations. We investigate the structure of semantic spaces that arise from embeddings made with Sentence-BERT and find that the representations suffer from a well-known problem in high dimensions called hubness. Hubness results in asymmetric neighborhood relations, such that some texts (the hubs) are neighbours of many other texts while most texts (so-called anti-hubs), are neighbours of few or no other texts. We quantify the semantic quality of the embeddings using hubness scores and error rate of a neighbourhood based classifier. We find that when hubness is high, we can reduce error rate and hubness using hubness reduction methods. We identify a combination of two methods as resulting in the best reduction. For example, on one of the tested pretrained models, this combined method can reduce hubness by about 75% and error rate by about 9%. Thus, we argue that mitigating hubness in the embedding space provides better semantic representations of text.


Hub-VAE: Unsupervised Hub-based Regularization of Variational Autoencoders

arXiv.org Artificial Intelligence

Exemplar-based methods rely on informative data points or prototypes to guide the optimization of learning algorithms. Such data facilitate interpretable model design and prediction. Of particular interest is the utility of exemplars in learning unsupervised deep representations. In this paper, we leverage hubs, which emerge as frequent neighbors in high-dimensional spaces, as exemplars to regularize a variational autoencoder and to learn a discriminative embedding for unsupervised down-stream tasks. We propose an unsupervised, data-driven regularization of the latent space with a mixture of hub-based priors and a hub-based contrastive loss. Experimental evaluation shows that our algorithm achieves superior cluster separability in the embedding space, and accurate data reconstruction and generation, compared to baselines and state-of-the-art techniques.


A Fast and Easy Regression Technique for k-NN Classification Without Using Negative Pairs

arXiv.org Machine Learning

This paper proposes an inexpensive way to learn an effective dissimilarity function to be used for $k$-nearest neighbor ($k$-NN) classification. Unlike Mahalanobis metric learning methods that map both query (unlabeled) objects and labeled objects to new coordinates by a single transformation, our method learns a transformation of labeled objects to new points in the feature space whereas query objects are kept in their original coordinates. This method has several advantages over existing distance metric learning methods: (i) In experiments with large document and image datasets, it achieves $k$-NN classification accuracy better than or at least comparable to the state-of-the-art metric learning methods. (ii) The transformation can be learned efficiently by solving a standard ridge regression problem. For document and image datasets, training is often more than two orders of magnitude faster than the fastest metric learning methods tested. This speed-up is also due to the fact that the proposed method eliminates the optimization over "negative" object pairs, i.e., objects whose class labels are different. (iii) The formulation has a theoretical justification in terms of reducing hubness in data.


Flattening the Density Gradient for Eliminating Spatial Centrality to Reduce Hubness

AAAI Conferences

Spatial centrality, whereby samples closer to the center of a dataset tend to be closer to all other samples, is regarded as one source of hubness. Hubness is well known to degrade k-nearest-neighbor (k-NN) classification. Spatial centrality can be removed by centering, i.e., shifting the origin to the global center of the dataset, in cases where inner product similarity is used. However, when Euclidean distance is used, centering has no effect on spatial centrality because the distance between the samples is the same before and after centering. As described in this paper, we propose a solution for the hubness problem when Euclidean distance is considered. We provide a theoretical explanation to demonstrate how the solution eliminates spatial centrality and reduces hubness. We then present some discussion of the reason the proposed solution works, from a viewpoint of density gradient, which is regarded as the origin of spatial centrality and hubness. We demonstrate that the solution corresponds to flattening the density gradient. Using real-world datasets, we demonstrate that the proposed method improves k-NN classification performance and outperforms an existing hub-reduction method.


Ridge Regression, Hubness, and Zero-Shot Learning

arXiv.org Machine Learning

This paper discusses the effect of hubness in zero-shot learning, when ridge regression is used to find a mapping between the example space to the label space. Contrary to the existing approach, which attempts to find a mapping from the example space to the label space, we show that mapping labels into the example space is desirable to suppress the emergence of hubs in the subsequent nearest neighbor search step. Assuming a simple data model, we prove that the proposed approach indeed reduces hubness. This was verified empirically on the tasks of bilingual lexicon extraction and image labeling: hubness was reduced with both of these tasks and the accuracy was improved accordingly.


Localized Centering: Reducing Hubness in Large-Sample Data

AAAI Conferences

Hubness has been recently identified as a problematic phenomenon occurring in high-dimensional space. In this paper, we address a different type of hubness that occurs when the number of samples is large. We investigate the difference between the hubness in high-dimensional data and the one in large-sample data. One finding is that centering, which is known to reduce the former, does not work for the latter. We then propose a new hub-reduction method, called localized centering. It is an extension of centering, yet works effectively for both types of hubness. Using real-world datasets consisting of a large number of documents, we demonstrate that the proposed method improves the accuracy of k-nearest neighbor classification.