AITopics

2501.1719

Country:

Asia > India > West Bengal > Kharagpur (0.05)
Asia > Middle East > Republic of Türkiye > Elazig Province > Elazig (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Health Care Technology > Medical Record (0.68)
Information Technology > Security & Privacy (0.68)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.32)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Gayoso-Cabada, Joaquín, Gómez-Albarrán, Mercedes, Sierra, José-Luis

Query-based versus resource-based cache strategies in tag-based browsing systems

arXiv.org Artificial IntelligenceJan-26-2025

Tag-based browsing is a popular interaction model for navigating digital libraries. According to this model, users select descriptive tags to filter resources in the collections. Typical implementations of the model are based on inverted indexes. However, these implementations can require a considerable amount of set operations to update the browsing state. To palliate this inconven-ience, it is possible to adopt suitable cache strategies. In this paper we describe and compare two of these strategies: (i) a query-based strategy, according to which previously computed browsing states are indexed by sets of selected tags; and (ii) a resource-based strategy, according to which browsing states are in-dexed by sets of filtered resources. Our comparison focused on runtime perfor-mance, and was carried out empirically, using a real-world web-based collec-tion in the field of digital humanities. The results obtained show that the re-source-based strategy clearly outperforms the query-based one.

artificial intelligence, information retrieval, natural language, (21 more...)

doi: 10.1007/978-3-030-04257-8_4

2501.15481

Country:

Europe > Spain > Galicia > Madrid (0.04)
North America > Panama (0.04)
Europe > Poland (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications > Social Media (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)

Neural Information Processing SystemsJan-25-2025, 23:58:01 GMT

HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory

The state-of-the-art approximate nearest neighbor search (ANNS) algorithms face a fundamental tradeoff between query latency and accuracy, because of small main memory capacity: To store indices in main memory for short query latency, the ANNS algorithms have to limit dataset size or use a quantization scheme which hurts search accuracy. The emergence of heterogeneous memory (HM) brings a solution to significantly increase memory capacity and break the above tradeoff: Using HM, billions of data points can be placed in the main memory on a single machine without using any data compression. However, HM consists of both fast (but small) memory and slow (but large) memory, and using HM inappropriately slows down query significantly. In this work, we present a novel graph-based similarity search algorithm called HM-ANN, which takes both memory and data heterogeneity into consideration and enables billion-scale similarity search on a single node without using compression. On two billion-sized datasets BIGANN and DEEP1B, HM-ANN outperforms state-of-the-art compression-based solutions such as L&C and IMI OPQ in recall-vs-latency by a large margin, obtaining 46% higher recall under the same search latency. We also extend existing graph-based methods such as HNSW and NSG with two strong baseline implementations on HM.

efficient billion-point nearest neighbor search, heterogeneous memory, hm-ann, (7 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.64)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.64)

Neural Information Processing SystemsJan-25-2025, 14:23:03 GMT

Review for NeurIPS paper: Optimal Query Complexity of Secure Stochastic Convex Optimization

In terms of the presentation, the paper constantly switches in terminology, both referring to "secure" and "private." Given that by now differential privacy has been established as a notion of formal privacy, I believe it is better to refer to this problem to "secure" optimization. For example, in page 2, line 49, classic bounds are referred as "non-private." I think it would be better to exclusively refer to "secure" in this definitions. However, they are clearly vacuous for the classical "non-secure" case.

neurips paper, optimal query complexity, secure stochastic convex optimization, (5 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.44)

Neural Information Processing SystemsJan-25-2025, 14:22:57 GMT

Review for NeurIPS paper: Optimal Query Complexity of Secure Stochastic Convex Optimization

This paper was carefully reviewed and discussed by our reviewer panel. The consensus was that this is nice work, the rebuttal had some sway, and the paper can be published in NeurIPS this year. But please do take into account the detailed comments of the reviewers when putting together your camera-ready version.

neurips paper, optimal query complexity, secure stochastic convex optimization

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.40)

Neural Information Processing SystemsJan-25-2025, 11:13:55 GMT

Reviews: Hierarchical Optimal Transport for Document Representation

This paper proposes a distance metric for documents. The proposed solution is to combine latent topics from topic models with the idea of using geometry from word embeddings to compute distances between pairs of documents (as in the WMD metric). First topics are computed, and WMD is performed at the topic level as opposed to the word level. The hypothesis presented is that modeling documents by their representative topics is better for highlighting differences despite the loss in resolution and is similar to how a person would do this task: breaking down each document into concepts, and then comparing the concepts. Since the topics are precomputed for a given corpus, speed up is gained at inference time when computing document similarities.

document representation, hierarchical optimal transport

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

arXiv.org Artificial IntelligenceJan-24-2025

Chain-of-Retrieval Augmented Generation

Wang, Liang, Chen, Haonan, Yang, Nan, Huang, Xiaolong, Dou, Zhicheng, Wei, Furu

This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer. Conventional RAG methods usually perform a single retrieval step before the generation process, which limits their effectiveness in addressing complex queries due to imperfect retrieval results. In contrast, our proposed method, CoRAG (Chain-of-Retrieval Augmented Generation), allows the model to dynamically reformulate the query based on the evolving state. To train CoRAG effectively, we utilize rejection sampling to automatically generate intermediate retrieval chains, thereby augmenting existing RAG datasets that only provide the correct final answer. At test time, we propose various decoding strategies to scale the model's test-time compute by controlling the length and number of sampled retrieval chains. Experimental results across multiple benchmarks validate the efficacy of CoRAG, particularly in multi-hop question answering tasks, where we observe more than 10 points improvement in EM score compared to strong baselines. On the KILT benchmark, CoRAG establishes a new state-of-the-art performance across a diverse range of knowledge-intensive tasks. Furthermore, we offer comprehensive analyses to understand the scaling behavior of CoRAG, laying the groundwork for future research aimed at developing factual and grounded foundation models.

information retrieval, large language model, machine learning, (17 more...)

2501.14342

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New Hampshire (0.05)
North America > United States > New York > Suffolk County > Stony Brook (0.04)
(17 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Yuksel, Goksenin, Kamps, Jaap

Interpretability Analysis of Domain Adapted Dense Retrievers

arXiv.org Artificial IntelligenceJan-24-2025

Dense retrievers have demonstrated significant potential for neural information retrieval; however, they exhibit a lack of robustness to domain shifts, thereby limiting their efficacy in zero-shot settings across diverse domains. Previous research has investigated unsupervised domain adaptation techniques to adapt dense retrievers to target domains. However, these studies have not focused on explainability analysis to understand how such adaptations alter the model's behavior. In this paper, we propose utilizing the integrated gradients framework to develop an interpretability method that provides both instance-based and ranking-based explanations for dense retrievers. To generate these explanations, we introduce a novel baseline that reveals both query and document attributions. This method is used to analyze the effects of domain adaptation on input attributions for query and document tokens across two datasets: the financial question answering dataset (FIQA) and the biomedical information retrieval dataset (TREC-COVID). Our visualizations reveal that domain-adapted models focus more on in-domain terminology compared to non-adapted models, exemplified by terms such as "hedge," "gold," "corona," and "disease." This research addresses how unsupervised domain adaptation techniques influence the behavior of dense retrievers when adapted to new domains. Additionally, we demonstrate that integrated gradients are a viable choice for explaining and analyzing the internal mechanisms of these opaque neural models.

attribution, information retrieval, question answering, (17 more...)

2501.14459

Country:

Europe > Netherlands > North Holland > Amsterdam (0.05)
North America > United States > New York > New York County > New York City (0.05)
South America > Colombia > Meta Department > Villavicencio (0.04)
(8 more...)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.55)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.34)

Neural Information Processing SystemsJan-23-2025, 09:55:46 GMT

Reviews: Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries

The main problem for me is that the paper promises a very real scenario (Figure 1) of how a user can refine search by using a sequence of refined queries. However, majority of the model design and evaluation (except section 4.2) is performed with dense region captions that have almost no sequential nature. While this is partially a strength as no additional labels are required, the method seems suited especially towards such disconnected queries -- there is space for M disconnected queries and only then updates are required. This would provide a deeper understanding of when the proposed method works better. In Figure 1, the user queries seem very natural, but the simulated queries in Figure 1 are not.

natural language query, query, user query, (6 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

Neural Information Processing SystemsJan-23-2025, 09:55:36 GMT

Reviews: Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries

This paper investigates the problem of multi-round natural language image retrieval, using annotations from the Visual Genome dataset for training and evaluation. After feedback and reviewer discussion, this paper received final ratings of 6, 6 and 7. Despite some concerns about the use of non-sequential annotation data for a sequential task, the reviewers found the proposed model to be generally sound and the experimental evaluation convincing, and the AC agrees. However, we would encourage the authors to pay close attention to the reviewer feedback when preparing the final paper version. In particular, the author feedback committed to including the additional baselines requested by R1, so these should be included in the final version as promised.

complex scene, interactive retrieval, natural language query, (1 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)