AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

RK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases

Neural Information Processing SystemsJun-2-2025, 05:03:04 GMT

Answering real-world complex queries, such as complex product search, often requires accurate retrieval from semi-structured knowledge bases that involve blend of unstructured (e.g., textual descriptions of products) and structured (e.g., entity relations of products) information. However, many previous works studied textual and relational retrieval tasks as separate topics.

information retrieval, large language model, machine learning, (21 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > Texas (0.14)
North America > United States > Louisiana (0.14)
Asia > Middle East > Qatar (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Leisure & Entertainment > Sports (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Datasheets for SRFUND

Neural Information Processing SystemsJun-1-2025, 20:01:31 GMT

A.1 Motivation For what purpose was the dataset created? The purpose of creating SRFUND dataset is to advance the development of form understanding and structured reconstruction tasks by covering forms of various layouts and languages. Although some benchmarks datasets [16, 17, 33, 37, 41, 44] have been established, none of them have established the global and hierarchical structural dependencies that consider all elements at different granularity, including words, text lines, and entities within the forms. To enhance the applicability of form understanding tasks in hierarchical structure recovery, we introduce the SRFUND, a multilingual document structure reconstruction dataset. To the best of our knowledge, this is the first benchmark in form understanding that integrates multi-level structure reconstruction, spanning from words to the global structure of forms, and we believe that the SRFUND dataset will significantly promote the development of form understanding and structured reconstruction. Who created the dataset (e.g., which team, research group) and on behalf of which entity (e.g., company, institution, organization)?

information retrieval, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Industry:

Law (0.46)
Information Technology (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Data Science (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)

Add feedback

CSPG: Crossing Sparse Proximity Graphs for Approximate Nearest Neighbor Search

Neural Information Processing SystemsJun-1-2025, 07:19:41 GMT

The state-of-the-art approximate nearest neighbor search (ANNS) algorithm builds a large proximity graph on the dataset and performs a greedy beam search, which may bring many unnecessary explorations. We develop a novel framework, namely corssing sparse proximity graph (CSPG), based on random partitioning of the dataset. It produces a smaller sparse proximity graph for each partition and routing vectors that bind all the partitions. An efficient two-staged approach is designed for exploring CSPG, with fast approaching and cross-partition expansion. We theoretically prove that CSPG can accelerate the existing graph-based ANNS algorithms by reducing unnecessary explorations. In addition, we conduct extensive experiments on benchmark datasets. The experimental results confirm that the existing graph-based methods can be significantly outperformed by incorporating CSPG, achieving 1.5x to 2x speedups of QPS in almost all recalls.

information retrieval, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Asia > China (0.14)
Europe > Portugal (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.85)

Add feedback

Random Projections with Asymmetric Quantization

Xiaoyun Li, Ping Li

Neural Information Processing SystemsJun-1-2025, 06:38:02 GMT

The method of random projection has been a popular tool for data compression, similarity search, and machine learning. In many practical scenarios, applying quantization on randomly projected data could be very helpful to further reduce storage cost and facilitate more efficient retrievals, while only suffering from little loss in accuracy. In real-world applications, however, data collected from different sources may be quantized under different schemes, which calls for a need to study the asymmetric quantization problem. In this paper, we investigate the cosine similarity estimators derived in such setting under the Lloyd-Max (LM) quantization scheme. We thoroughly analyze the biases and variances of a series of estimators including the basic simple estimators, their normalized versions, and their debiased versions. Furthermore, by studying the monotonicity, we show that the expectation of proposed estimators increases with the true cosine similarity, on a broader family of stair-shaped quantizers. Experiments on nearest neighbor search justify the theory and illustrate the effectiveness of our proposed estimators.

data mining, information retrieval, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.34)

Add feedback

Checklist

Neural Information Processing SystemsMay-31-2025, 18:41:48 GMT

A.1 Motivation For what purpose was the dataset created? EHRs are integral for storing comprehensive patient medical records, combining structured data with detailed clinical notes. However, they often suffer from discrepancies due to unintuitive EHR system designs and human errors, posing serious risks to patient safety. To address this, we developed EHRCon. Who created the dataset (e.g., which team, research group) and on behalf of which entity (e.g., company, institution, organization)?

information retrieval, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
(2 more...)

Technology:

Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.93)
(2 more...)

Add feedback

SM3-Text-to-Query: Synthetic Multi-Model Medical Text-to-Query Benchmark

Neural Information Processing SystemsMay-31-2025, 18:03:50 GMT

Electronic health records (EHRs) are stored in various database systems with different database models on heterogeneous storage architectures, such as relational databases, document stores, or graph databases. These different database models have a big impact on query complexity and performance. While this has been a known fact in database research, its implications for the growing number of Text-to-Query systems have surprisingly not been investigated so far. In this paper, we present SM3-Text-to-Query, the first multi-model medical Text-to-Query benchmark based on synthetic patient data from Synthea, following the SNOMED-CT taxonomy--a widely used knowledge graph ontology covering medical terminology. SM3-Text-to-Query provides data representations for relational databases (PostgreSQL), document stores (MongoDB), and graph databases (Neo4j and GraphDB (RDF)), allowing the evaluation across four popular query languages, namely SQL, MQL, Cypher, and SPARQL. We systematically and manually develop 408 template questions, which we augment to construct a benchmark of 10K diverse natural language question/query pairs for these four query languages (40K pairs overall). On our dataset, we evaluate several common in-context-learning (ICL) approaches for a set of representative closed and open-source LLMs.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.93)
Europe (0.67)

Genre: Research Report (0.67)

Industry:

Health & Medicine > Health Care Providers & Services (0.92)
Health & Medicine > Health Care Technology > Medical Record (0.86)
Government > Regional Government > North America Government > United States Government (0.67)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Push-pull Feedback Implements Hierarchical Information Retrieval Efficiently Zilong Ji2

Neural Information Processing SystemsMay-31-2025, 09:30:24 GMT

Experimental data has revealed that in addition to feedforward connections, there exist abundant feedback connections in a neural pathway. Although the importance of feedback in neural information processing has been widely recognized in the field, the detailed mechanism of how it works remains largely unknown. Here, we investigate the role of feedback in hierarchical information retrieval. Specifically, we consider a hierarchical network storing the hierarchical categorical information of objects, and information retrieval goes from rough to fine, aided by dynamical push-pull feedback from higher to lower layers. We elucidate that the push (positive) and pull (negative) feedbacks suppress the interferences due to neural correlations between different and the same categories, respectively, and their joint effect improves retrieval performance significantly. Our model agrees with the push-pull phenomenon observed in neural data and sheds light on our understanding of the role of feedback in neural information processing.

information retrieval, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Asia > China (0.69)
North America (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.94)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Efficient Pure Exploration in Adaptive Round model

Tianyuan Jin, Jieming SHI, Xiaokui Xiao, Enhong Chen

Neural Information Processing SystemsMay-31-2025, 03:21:17 GMT

In the adaptive setting, many multi-armed bandit applications allow the learner to adaptively draw samples and adjust sampling strategy in rounds. In many real applications, not only the query complexity but also the round complexity need to be optimized. In this paper, we study both PAC and exact top-k arm identification problems and design efficient algorithms considering both round complexity and query complexity.

data mining, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia (0.14)
North America > Canada (0.14)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.59)
Information Technology > Data Science > Data Mining > Big Data (0.50)

Add feedback

Rand-NSG: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node

Suhas Jayaram Subramanya, Fnu Devvrit, Harsha Vardhan Simhadri, Ravishankar Krishnawamy, Rohan Kadekodi

Neural Information Processing SystemsMay-31-2025, 01:52:30 GMT

Current state-of-the-art approximate nearest neighbor search (ANNS) algorithms generate indices that must be stored in main memory for fast high-recall search. This makes them expensive and limits the size of the dataset. We present a new graph-based indexing and search system called DiskANN that can index, store, and search a billion point database on a single workstation with just 64GB RAM and an inexpensive solid-state drive (SSD). Contrary to current wisdom, we demonstrate that the SSD-based indices built by DiskANN can meet all three desiderata for large-scale ANNS: high-recall, low query latency and high density (points indexed per node). On the billion point SIFT1B bigann dataset, DiskANN serves > 5000 queries a second with < 3ms mean latency and 95%+ 1-recall@1 on a 16 core machine, where state-of-the-art billion-point ANNS algorithms with similar memory footprint like FAISS [18] and IVFOADC+G+P [8] plateau at around 50% 1-recall@1. Alternately, in the high recall regime, DiskANN can index and serve 5 10x more points per node compared to state-of-the-art graphbased methods such as HNSW [21] and NSG [13]. Finally, as part of our overall DiskANN system, we introduce Vamana, a new graph-based ANNS index that is more versatile than the existing graph indices even for in-memory indices.

information retrieval, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: North America > United States > Texas > Travis County > Austin (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.85)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.63)

Add feedback

Generative Retrieval Meets Multi-Graded Relevance Yubao Tang 1,2

Neural Information Processing SystemsMay-30-2025, 18:25:21 GMT

Generative retrieval represents a novel approach to information retrieval. It uses an encoder-decoder architecture to directly produce relevant document identifiers (docids) for queries. While this method offers benefits, current approaches are limited to scenarios with binary relevance data, overlooking the potential for documents to have multi-graded relevance. Extending generative retrieval to accommodate multi-graded relevance poses challenges, including the need to reconcile likelihood probabilities for docid pairs and the possibility of multiple relevant documents sharing the same identifier.

information retrieval, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: