AITopics | hubness

Collaborating Authors

hubness

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

a19883fca95d0e5ec7ee6c94c6c32028-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 15:07:01 GMT

dataset, experiment, learning rate, (13 more...)

Neural Information Processing Systems

Country: North America > Canada (0.04)

Genre: Research Report > New Finding (0.51)

Industry:

Health & Medicine > Therapeutic Area > Hematology (0.70)
Health & Medicine > Therapeutic Area > Oncology > Leukemia (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Cache Mechanism for Agent RAG Systems

Lin, Shuhang, Peng, Zhencan, Li, Lingyao, Lin, Xiao, Zhu, Xi, Zhang, Yongfeng

arXiv.org Artificial IntelligenceNov-6-2025

Recent advances in Large Language Model (LLM)-based agents have been propelled by Retrieval-Augmented Generation (RAG), which grants the models access to vast external knowledge bases. Despite RAG's success in improving agent performance, agent-level cache management, particularly constructing, maintaining, and updating a compact, relevant corpus dynamically tailored to each agent's need, remains underexplored. Therefore, we introduce ARC (Agent RAG Cache Mechanism), a novel, annotation-free caching framework that dynamically manages small, high-value corpora for each agent. By synthesizing historical query distribution patterns with the intrinsic geometry of cached items in the embedding space, ARC automatically maintains a high-relevance cache. With comprehensive experiments on three retrieval datasets, our experimental results demonstrate that ARC reduces storage requirements to 0.015% of the original corpus while offering up to 79.8% has-answer rate and reducing average retrieval latency by 80%. Our results demonstrate that ARC can drastically enhance efficiency and effectiveness in RAG-powered LLM agents.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.02919

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.74)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Supplement to Learning Deep Attribution Priors Based On Prior Knowledge 1 Model Implementations and Hyperparameter Tuning LASSO: In our experiments we used the scikit-learn [ 10

Neural Information Processing SystemsAug-15-2025, 12:40:31 GMT

All linear models were implemented using PyTorch. We used an Nvidia GTX 1080 Ti GPU for training. IG computes feature attributions by comparing a model's prediction with the prediction We also found that EG led to the best performance for models trained using the DAPr framework. RNA-seq data as follows 1. N is the total number of counts. We also scaled Dasatinib IC50 values to have zero mean and unit variance.

dataset, experiment, learning rate, (13 more...)

Neural Information Processing Systems

Country: North America > Canada (0.04)

Genre: Research Report > New Finding (0.51)

Industry:

Health & Medicine > Therapeutic Area > Hematology (0.90)
Health & Medicine > Therapeutic Area > Oncology > Leukemia (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Prediction hubs are context-informed frequent tokens in LLMs

Nielsen, Beatrix M. G., Macocco, Iuri, Baroni, Marco

arXiv.org Artificial IntelligenceFeb-14-2025

Hubness, the tendency for few points to be among the nearest neighbours of a disproportionate number of other points, commonly arises when applying standard distance measures to high-dimensional data, often negatively impacting distance-based analysis. As autoregressive large language models (LLMs) operate on high-dimensional representations, we ask whether they are also affected by hubness. We first show, theoretically, that the only representation comparison operation performed by LLMs, namely that between context and unembedding vectors to determine continuation probabilities, is not characterized by the concentration of distances phenomenon that typically causes the appeareance of nuisance hubness. We then empirically show that this comparison still leads to a high degree of hubness, but the hubs in this case do not constitute a disturbance. They are rather the result of context-modulated frequent tokens often appearing in the pool of likely candidates for next token prediction. On the other hand, when other distance computations involving LLM representations are performed, we do not have the same theoretical guarantees, and, indeed, we see nuisance hubs appear. In summary, our work highlights, on the one hand, how hubness, while omnipresent in high-dimensional spaces, is not always a negative property that needs to be mitigated, and, on the other hand, it shows that various widely-used LLMs have developed a guessing strategy that consists in constantly assigning a high probability to frequent tokens.

concentration, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.10201

Country:

Asia (1.00)
Europe (0.92)
North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Hubness Reduction Improves Sentence-BERT Semantic Spaces

Nielsen, Beatrix M. G., Hansen, Lars Kai

arXiv.org Artificial IntelligenceNov-30-2023

Semantic representations of text, i.e. representations of natural language which capture meaning by geometry, are essential for areas such as information retrieval and document grouping. High-dimensional trained dense vectors have received much attention in recent years as such representations. We investigate the structure of semantic spaces that arise from embeddings made with Sentence-BERT and find that the representations suffer from a well-known problem in high dimensions called hubness. Hubness results in asymmetric neighborhood relations, such that some texts (the hubs) are neighbours of many other texts while most texts (so-called anti-hubs), are neighbours of few or no other texts. We quantify the semantic quality of the embeddings using hubness scores and error rate of a neighbourhood based classifier. We find that when hubness is high, we can reduce error rate and hubness using hubness reduction methods. We identify a combination of two methods as resulting in the best reduction. For example, on one of the tested pretrained models, this combined method can reduce hubness by about 75% and error rate by about 9%. Thus, we argue that mitigating hubness in the embedding space provides better semantic representations of text.

base 0, local scaling 0, mp 0, (14 more...)

arXiv.org Artificial Intelligence

2311.18364

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Dominican Republic (0.04)
(5 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Hub-VAE: Unsupervised Hub-based Regularization of Variational Autoencoders

Mani, Priya, Domeniconi, Carlotta

arXiv.org Artificial IntelligenceNov-18-2022

Exemplar-based methods rely on informative data points or prototypes to guide the optimization of learning algorithms. Such data facilitate interpretable model design and prediction. Of particular interest is the utility of exemplars in learning unsupervised deep representations. In this paper, we leverage hubs, which emerge as frequent neighbors in high-dimensional spaces, as exemplars to regularize a variational autoencoder and to learn a discriminative embedding for unsupervised down-stream tasks. We propose an unsupervised, data-driven regularization of the latent space with a mixture of hub-based priors and a hub-based contrastive loss. Experimental evaluation shows that our algorithm achieves superior cluster separability in the embedding space, and accurate data reconstruction and generation, compared to baselines and state-of-the-art techniques.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2211.10469

Country:

North America > United States > Michigan (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Spain > Canary Islands (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

A Fast and Easy Regression Technique for k-NN Classification Without Using Negative Pairs

Shigeto, Yutaro, Shimbo, Masashi, Matsumoto, Yuji

arXiv.org Machine LearningJun-11-2018

This paper proposes an inexpensive way to learn an effective dissimilarity function to be used for $k$-nearest neighbor ($k$-NN) classification. Unlike Mahalanobis metric learning methods that map both query (unlabeled) objects and labeled objects to new coordinates by a single transformation, our method learns a transformation of labeled objects to new points in the feature space whereas query objects are kept in their original coordinates. This method has several advantages over existing distance metric learning methods: (i) In experiments with large document and image datasets, it achieves $k$-NN classification accuracy better than or at least comparable to the state-of-the-art metric learning methods. (ii) The transformation can be learned efficiently by solving a standard ridge regression problem. For document and image datasets, training is often more than two orders of magnitude faster than the fastest metric learning methods tested. This speed-up is also due to the fact that the proposed method eliminates the optimization over "negative" object pairs, i.e., objects whose class labels are different. (iii) The formulation has a theoretical justification in terms of reducing hubness in data.

artificial intelligence, dataset, machine learning, (14 more...)

arXiv.org Machine Learning

1806.03945

Country:

Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Kantō > Chiba Prefecture > Chiba (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.40)

Add feedback

Flattening the Density Gradient for Eliminating Spatial Centrality to Reduce Hubness

Hara, Kazuo (National Institute of Genetics) | Suzuki, Ikumi (Yamagata University) | Kobayashi, Kei (The Institute of Statistical Mathematics) | Fukumizu, Kenji (The Institute of Statistical Mathematics) | Radovanovic, Milos (University of Novi Sad)

AAAI ConferencesApr-19-2016

Spatial centrality, whereby samples closer to the center of a dataset tend to be closer to all other samples, is regarded as one source of hubness. Hubness is well known to degrade k-nearest-neighbor (k-NN) classification. Spatial centrality can be removed by centering, i.e., shifting the origin to the global center of the dataset, in cases where inner product similarity is used. However, when Euclidean distance is used, centering has no effect on spatial centrality because the distance between the samples is the same before and after centering. As described in this paper, we propose a solution for the hubness problem when Euclidean distance is considered. We provide a theoretical explanation to demonstrate how the solution eliminates spatial centrality and reduces hubness. We then present some discussion of the reason the proposed solution works, from a viewpoint of density gradient, which is regarded as the origin of spatial centrality and hubness. We demonstrate that the solution corresponds to flattening the density gradient. Using real-world datasets, we demonstrate that the proposed method improves k-NN classification performance and outperforms an existing hub-reduction method.

artificial intelligence, hubness, machine learning, (15 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre: Research Report (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback

Ridge Regression, Hubness, and Zero-Shot Learning

Shigeto, Yutaro, Suzuki, Ikumi, Hara, Kazuo, Shimbo, Masashi, Matsumoto, Yuji

arXiv.org Machine LearningJul-3-2015

This paper discusses the effect of hubness in zero-shot learning, when ridge regression is used to find a mapping between the example space to the label space. Contrary to the existing approach, which attempts to find a mapping from the example space to the label space, we show that mapping labels into the example space is desirable to suppress the emergence of hubs in the subsequent nearest neighbor search step. Assuming a simple data model, we prove that the proposed approach indeed reduces hubness. This was verified empirically on the tasks of bilingual lexicon extraction and image labeling: hubness was reduced with both of these tasks and the accuracy was improved accordingly.

large language model, machine learning, ridge regression, (17 more...)

arXiv.org Machine Learning

1507.00825

Country: Asia > Japan > Honshū (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.63)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.61)

Add feedback

Localized Centering: Reducing Hubness in Large-Sample Data

Hara, Kazuo (National Institute of Genetics) | Suzuki, Ikumi (National Institute of Genetics) | Shimbo, Masashi (Nara Institute of Science and Technology) | Kobayashi, Kei (The Institute of Statistical Mathematics) | Fukumizu, Kenji (The Institute of Statistical Mathematics) | Radovanović, Miloš (University of Novi Sad)

AAAI ConferencesMar-6-2015

Hubness has been recently identified as a problematic phenomenon occurring in high-dimensional space. In this paper, we address a different type of hubness that occurs when the number of samples is large. We investigate the difference between the hubness in high-dimensional data and the one in large-sample data. One finding is that centering, which is known to reduce the former, does not work for the latter. We then propose a new hub-reduction method, called localized centering. It is an extension of centering, yet works effectively for both types of hubness. Using real-world datasets consisting of a large number of documents, we demonstrate that the proposed method improves the accuracy of k-nearest neighbor classification.

artificial intelligence, machine learning, natural language, (18 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Serbia > Vojvodina > South Bačka District > Novi Sad (0.04)
Asia > Japan > Honshū > Chūbu > Shizuoka Prefecture > Shizuoka (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback