AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Scalable and Efficient Non-adaptive Deterministic Group Testing

Neural Information Processing SystemsAug-15-2025, 06:42:27 GMT

In particular, it holds for classical QGT. Although dispersion has been used before in GT, c.f., [ The last fact explains why further increase of is not needed.

algorithm, construction, query, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Washington > King County > Seattle (0.04)
(11 more...)

Genre: Research Report (0.67)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.47)

Add feedback

6 Appendix

Neural Information Processing SystemsAug-15-2025, 05:34:52 GMT

As described in 3, the MemRecall is the process to extract the key blocks. We also need "strides" as BM25 is a famous TF-IDF-like information retrieval method. Each block is scored based on the common words with query or textual label. However, the semantic relevance are neglected. Glove is a group of pretrained word representation.

relevance, relevance score, textual label, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.58)

Add feedback

Advances in Logic-Based Entity Resolution: Enhancing ASPEN with Local Merges and Optimality Criteria

Xiang, Zhliang, Bienvenu, Meghyn, Cima, Gianluca, Gutiérrez-Basulto, Víctor, Ibáñez-García, Yazmín

arXiv.org Artificial IntelligenceAug-15-2025

In this paper, we present ASPEN+, which extends an existing ASP-based system, ASPEN,for collective entity resolution with two important functionalities: support for local merges and new optimality criteria for preferred solutions. Indeed, ASPEN only supports so-called global merges of entity-referring constants (e.g. author ids), in which all occurrences of matched constants are treated as equivalent and merged accordingly. However, it has been argued that when resolving data values, local merges are often more appropriate, as e.g. some instances of 'J. Lee' may refer to 'Joy Lee', while others should be matched with 'Jake Lee'. In addition to allowing such local merges, ASPEN+ offers new optimality criteria for selecting solutions, such as minimizing rule violations or maximising the number of rules supporting a merge. Our main contributions are thus (1) the formalisation and computational analysis of various notions of optimal solution, and (2) an extensive experimental evaluation on real-world datasets, demonstrating the effect of local merges and the new optimality criteria on both accuracy and runtime.

criteria, logic & formal reasoning, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2508.10504

Country:

Europe (1.00)
Asia > China (0.28)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.70)

Add feedback

A Lightweight Learned Cardinality Estimation Model

Zhu, Yaoyu, Zhang, Jintao, Li, Guoliang, Feng, Jianhua

arXiv.org Artificial IntelligenceAug-14-2025

--Cardinality estimation is a fundamental task in database management systems, aiming to predict query results accurately without executing the queries. However, existing techniques either achieve low estimation accuracy or take high inference latency. Simultaneously achieving high speed and accuracy becomes critical for the cardinality estimation problem. In this paper, we propose a novel data-driven approach called CoDe (Covering with Decompositions) to address this problem. CoDe employs the concept of covering design, which divides the table into multiple smaller, overlapping segments. For each segment, CoDe utilizes tensor decomposition to accurately model its data distribution. Moreover, CoDe introduces innovative algorithms to select the best-fitting distributions for each query, combining them to estimate the final result. Notably, experimental results show that our method represents a significant advancement in cardinality estimation, achieving state-of-the-art levels of both estimation accuracy and inference efficiency. Across various datasets, CoDe achieves absolute accuracy in estimating more than half of the queries. Cardinality estimation poses a critical challenge in database management systems (DBMS) as it aims to predict query results accurately without executing the queries. This task is crucial for query optimization, as it allows the optimizer to devise the most efficient query plans. Despite numerous proposed solutions, cardinality estimation remains an unsolved problem. Two primary approaches have been explored to tackle this issue: workload-driven methods [17], [32] and data-driven methods [27], [47], [49]. Motivation. Figure 1 illustrates the comparison between our work and the limitations of existing methods. Workload-driven methods focus on learning patterns from historical workloads and their corresponding results. While these methods are generally fast, their accuracy can degrade when workloads change or are randomly generated. This limitation stems from their lack of direct access to the underlying data and their heavy reliance on the distribution of past workloads. As a result, they are positioned in the bottom-right corner of the graph. On the other hand, recent advancements in data-driven methods directly learn the data distribution, significantly improving estimation accuracy. The authors are with the Department of Computer Science and Technology, Tsinghua University, Beijing, China. Data-driven methods are often orders of magnitude slower than workload-driven methods, placing them in the top-left corner of the graph. Achieving both high speed and accuracy simultaneously is a critical challenge in cardinality estimation, which our work aims to address. Recent research, such as UAE [45], has explored hybrid approaches that combine data and workload information, using workload patterns to enhance data learning.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TKDE.2025.3591025

2508.09602

Country:

Asia > Middle East > UAE (0.25)
Asia > China > Beijing > Beijing (0.24)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Social-Sensor Identity Cloning Detection Using Weakly Supervised Deep Forest and Cryptographic Authentication

Alharbi, Ahmed, Dong, Hai, Yi, Xun

arXiv.org Artificial IntelligenceAug-14-2025

Recent years have witnessed a rising trend in social-sensor cloud identity cloning incidents. However, existing approaches suffer from unsatisfactory performance, a lack of solutions for detecting duplicated accounts, and a lack of large-scale evaluations on real-world datasets. We introduce a novel method for detecting identity cloning in social-sensor cloud service providers. Our proposed technique consists of two primary components: 1) a similar identity detection method and 2) a cryptography-based authentication protocol. Initially, we developed a weakly supervised deep forest model to identify similar identities using non-privacy-sensitive user profile features provided by the service. Subsequently, we designed a cryptography-based authentication protocol to verify whether similar identities were generated by the same provider. Our extensive experiments on a large real-world dataset demonstrate the feasibility and superior performance of our technique compared to current state-of-the-art identity clone detection methods.

data mining, machine learning, social-sensor identity cloning detection, (15 more...)

arXiv.org Artificial Intelligence

2508.09665

Country: Asia (0.46)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(4 more...)

Add feedback

Adaptive Personalized Conversational Information Retrieval

Mo, Fengran, Hui, Yuchen, Tian, Yuxing, Tan, Zhaoxuan, Meng, Chuan, Su, Zhan, Huang, Kaiyu, Nie, Jian-Yun

arXiv.org Artificial IntelligenceAug-13-2025

Personalized conversational information retrieval (CIR) systems aim to satisfy users' complex information needs through multi-turn interactions by considering user profiles. However, not all search queries require personalization. The challenge lies in appropriately incorporating personalization elements into search when needed. Most existing studies implicitly incorporate users' personal information and conversational context using large language models without distinguishing the specific requirements for each query turn. Such a ``one-size-fits-all'' personalization strategy might lead to sub-optimal results. In this paper, we propose an adaptive personalization method, in which we first identify the required personalization level for a query and integrate personalized queries with other query reformulations to produce various enhanced queries. Then, we design a personalization-aware ranking fusion approach to assign fusion weights dynamically to different reformulated queries, depending on the required personalization level. The proposed adaptive personalized conversational information retrieval framework APCIR is evaluated on two TREC iKAT datasets. The results confirm the effectiveness of adaptive personalization of APCIR by outperforming state-of-the-art methods.

information retrieval, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2508.08634

Country:

Asia (0.68)
North America > United States (0.46)
North America > Canada > Quebec (0.15)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Efficient and Effective Query Context-Aware Learning-to-Rank Model for Sequential Recommendation

Dzhoha, Andrii, Mironenko, Alisa, Labzin, Evgeny, Vlasov, Vladimir, Versteegh, Maarten, Celikik, Marjan

arXiv.org Artificial IntelligenceAug-13-2025

Modern sequential recommender systems commonly use transformer-based models for next-item prediction. While these models demonstrate a strong balance between efficiency and quality, integrating interleaving features - such as the query context (e.g., browse category) under which next-item interactions occur - poses challenges. Effectively capturing query context is crucial for refining ranking relevance and enhancing user engagement, as it provides valuable signals about user intent within a session. Unlike item features, historical query context is typically not aligned with item sequences and may be unavailable at inference due to privacy constraints or feature store limitations - making its integration into transformers both challenging and error-prone. This paper analyzes different strategies for incorporating query context into transformers trained with a causal language modeling procedure as a case study. We propose a new method that effectively fuses the item sequence with query context within the attention mechanism. Through extensive offline and online experiments on a large-scale online platform and open datasets, we present evidence that our proposed method is an effective approach for integrating query context to improve model ranking quality in terms of relevance and diversity.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2507.03789

Country:

Europe (1.00)
Asia (0.93)
North America > United States > Massachusetts (0.28)
North America > United States > California (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving Document Retrieval Coherence for Semantically Equivalent Queries

Campese, Stefano, Moschitti, Alessandro, Lauriola, Ivano

arXiv.org Artificial IntelligenceAug-12-2025

Dense Retrieval (DR) models have proven to be effective for Document Retrieval and Information Grounding tasks. Usually, these models are trained and optimized for improving the relevance of top-ranked documents for a given query. Previous work has shown that popular DR models are sensitive to the query and document lexicon: small variations of it may lead to a significant difference in the set of retrieved documents. In this paper, we propose a variation of the Multi-Negative Ranking loss for training DR that improves the coherence of models in retrieving the same documents with respect to semantically similar queries. The loss penalizes discrepancies between the top-k ranked documents retrieved for diverse but semantic equivalent queries. We conducted extensive experiments on various datasets, MS-MARCO, Natural Questions, BEIR, and TREC DL 19/20. The results show that (i) models optimizes by our loss are subject to lower sensitivity, and, (ii) interestingly, higher accuracy.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.07975

Country:

North America > United States (0.93)
Asia > Middle East > UAE (0.46)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.71)

Add feedback

Balancing Privacy and Efficiency: Music Information Retrieval via Additive Homomorphic Encryption

Wang, William Zerong, Zhao, Dongfang

arXiv.org Artificial IntelligenceAug-12-2025

In the era of generative AI, ensuring the privacy of music data presents unique challenges: unlike static artworks such as images, music data is inherently temporal and multimodal, and it is sampled, transformed, and remixed at an unprecedented scale. These characteristics make its core vector embeddings, i.e, the numerical representations of the music, highly susceptible to being learned, misused, or even stolen by models without accessing the original audio files. Traditional methods like copyright licensing and digital watermarking offer limited protection for these abstract mathematical representations, thus necessitating a stronger, e.g., cryptographic, approach to safeguarding the embeddings themselves. Standard encryption schemes, such as AES, render data unintelligible for computation, making such searches impossible. While Fully Homomorphic Encryption (FHE) provides a plausible solution by allowing arbitrary computations on ciphertexts, its substantial performance overhead remains impractical for large-scale vector similarity searches. Given this trade-off, we propose a more practical approach using Additive Homomorphic Encryption (AHE) for vector similarity search. The primary contributions of this paper are threefold: we analyze threat models unique to music information retrieval systems; we provide a theoretical analysis and propose an efficient AHE-based solution through inner products of music embeddings to deliver privacy-preserving similarity search; and finally, we demonstrate the efficiency and practicality of the proposed approach through empirical evaluation and comparison to FHE schemes on real-world MP3 files.

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.07044

Genre: Research Report (0.64)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Add feedback

Are All Genders Equal in the Eyes of Algorithms? -- Analysing Search and Retrieval Algorithms for Algorithmic Gender Fairness

Urchs, Stefanie, Thurner, Veronika, Aßenmacher, Matthias, Bothmann, Ludwig, Heumann, Christian, Thiemichen, Stephanie

arXiv.org Artificial IntelligenceAug-11-2025

Algorithmic systems such as search engines and information retrieval platforms significantly influence academic visibility and the dissemination of knowledge. Despite assumptions of neutrality, these systems can reproduce or reinforce societal biases, including those related to gender. This paper introduces and applies a bias-preserving definition of algorithmic gender fairness, which assesses whether algorithmic outputs reflect real-world gender distributions without introducing or amplifying disparities. Using a heterogeneous dataset of academic profiles from German universities and universities of applied sciences, we analyse gender differences in metadata completeness, publication retrieval in academic databases, and visibility in Google search results. While we observe no overt algorithmic discrimination, our findings reveal subtle but consistent imbalances: male professors are associated with a greater number of search results and more aligned publication records, while female professors display higher variability in digital visibility. These patterns reflect the interplay between platform algorithms, institutional curation, and individual self-presentation. Our study highlights the need for fairness evaluations that account for both technical performance and representational equality in digital systems.

artificial intelligence, information retrieval, natural language, (13 more...)

arXiv.org Artificial Intelligence

2508.0568

Country:

Europe (1.00)
North America > United States (0.94)

Genre: Research Report > New Finding (1.00)

Industry:

Law > Civil Rights & Constitutional Law (1.00)
Information Technology (0.89)
Government (0.69)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback