Information Retrieval
RoRA-VLM: Robust Retrieval-Augmented Vision Language Models
Qi, Jingyuan, Xu, Zhiyang, Shao, Rulin, Chen, Yang, Di, Jin, Cheng, Yu, Wang, Qifan, Huang, Lifu
Current vision-language models (VLMs) still exhibit inferior performance on knowledge-intensive tasks, primarily due to the challenge of accurately encoding all the associations between visual objects and scenes to their corresponding entities and background knowledge. While retrieval augmentation methods offer an efficient way to integrate external knowledge, extending them to vision-language domain presents unique challenges in (1) precisely retrieving relevant information from external sources due to the inherent discrepancy within the multimodal queries, and (2) being resilient to the irrelevant, extraneous and noisy information contained in the retrieved multimodal knowledge snippets. In this work, we introduce RORA-VLM, a novel and robust retrieval augmentation framework specifically tailored for VLMs, with two key innovations: (1) a 2-stage retrieval process with image-anchored textual-query expansion to synergistically combine the visual and textual information in the query and retrieve the most relevant multimodal knowledge snippets; and (2) a robust retrieval augmentation method that strengthens the resilience of VLMs against irrelevant information in the retrieved multimodal knowledge by injecting adversarial noises into the retrieval-augmented training process, and filters out extraneous visual information, such as unrelated entities presented in images, via a query-oriented visual token refinement strategy. We conduct extensive experiments to validate the effectiveness and robustness of our proposed methods on three widely adopted benchmark datasets. Our results demonstrate that with a minimal amount of training instance, RORA-VLM enables the base model to achieve significant performance improvement and constantly outperform state-of-the-art retrieval-augmented VLMs on all benchmarks while also exhibiting a novel zero-shot domain transfer capability.
Learning to Rank for Multiple Retrieval-Augmented Models through Iterative Utility Maximization
Salemi, Alireza, Zamani, Hamed
This paper investigates the design of a unified search engine to serve multiple retrieval-augmented generation (RAG) agents, each with a distinct task, backbone large language model (LLM), and retrieval-augmentation strategy. We introduce an iterative approach where the search engine generates retrieval results for these RAG agents and gathers feedback on the quality of the retrieved documents during an offline phase. This feedback is then used to iteratively optimize the search engine using a novel expectation-maximization algorithm, with the goal of maximizing each agent's utility function. Additionally, we adapt this approach to an online setting, allowing the search engine to refine its behavior based on real-time individual agents feedback to better serve the results for each of them. Experiments on diverse datasets from the Knowledge-Intensive Language Tasks (KILT) benchmark demonstrates that our approach significantly on average outperforms competitive baselines across 18 RAG models. We also demonstrate that our method effectively ``personalizes'' the retrieval process for each RAG agent based on the collected feedback. Finally, we provide a comprehensive ablation study to explore various aspects of our method.
Information Discovery in e-Commerce
Ren, Zhaochun, He, Xiangnan, Yin, Dawei, de Rijke, Maarten
Electronic commerce, or e-commerce, is the buying and selling of goods and services, or the transmitting of funds or data online. E-commerce platforms come in many kinds, with global players such as Amazon, Airbnb, Alibaba, eBay and platforms targeting specific geographic regions. Information retrieval has a natural role to play in e-commerce, especially in connecting people to goods and services. Information discovery in e-commerce concerns different types of search (e.g., exploratory search vs. lookup tasks), recommender systems, and natural language processing in e-commerce portals. The rise in popularity of e-commerce sites has made research on information discovery in e-commerce an increasingly active research area. This is witnessed by an increase in publications and dedicated workshops in this space. Methods for information discovery in e-commerce largely focus on improving the effectiveness of e-commerce search and recommender systems, on enriching and using knowledge graphs to support e-commerce, and on developing innovative question answering and bot-based solutions that help to connect people to goods and services. In this survey, an overview is given of the fundamental infrastructure, algorithms, and technical solutions for information discovery in e-commerce. The topics covered include user behavior and profiling, search, recommendation, and language technology in e-commerce.
A Complete Characterization of Learnability for Stochastic Noisy Bandits
We study the stochastic noisy bandit problem with an unknown reward function $f^*$ in a known function class $\mathcal{F}$. Formally, a model $M$ maps arms $\pi$ to a probability distribution $M(\pi)$ of reward. A model class $\mathcal{M}$ is a collection of models. For each model $M$, define its mean reward function $f^M(\pi)=\mathbb{E}_{r \sim M(\pi)}[r]$. In the bandit learning problem, we proceed in rounds, pulling one arm $\pi$ each round and observing a reward sampled from $M(\pi)$. With knowledge of $\mathcal{M}$, supposing that the true model $M\in \mathcal{M}$, the objective is to identify an arm $\hat{\pi}$ of near-maximal mean reward $f^M(\hat{\pi})$ with high probability in a bounded number of rounds. If this is possible, then the model class is said to be learnable. Importantly, a result of \cite{hanneke2023bandit} shows there exist model classes for which learnability is undecidable. However, the model class they consider features deterministic rewards, and they raise the question of whether learnability is decidable for classes containing sufficiently noisy models. For the first time, we answer this question in the positive by giving a complete characterization of learnability for model classes with arbitrary noise. In addition to that, we also describe the full spectrum of possible optimal query complexities. Further, we prove adaptivity is sometimes necessary to achieve the optimal query complexity. Last, we revisit an important complexity measure for interactive decision making, the Decision-Estimation-Coefficient \citep{foster2021statistical,foster2023tight}, and propose a new variant of the DEC which also characterizes learnability in this setting.
TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s
This paper presents a novel nearest neighbor search algorithm achieving TPU (Google Tensor Processing Unit) peak performance, outperforming state-of-the-art GPU algorithms with similar level of recall. The design of the proposed algorithm is motivated by an accurate accelerator performance model that takes into account both the memory and instruction bottlenecks. Our algorithm comes with an analytical guarantee of recall in expectation and does not require maintaining sophisticated index data structure or tuning, making it suitable for applications with frequent updates. Our work is available in the open-source package of Jax and Tensorflow on TPU.
Debiased and Denoised Entity Recognition from Distant Supervision
While distant supervision has been extensively explored and exploited in NLP tasks like named entity recognition, a major obstacle stems from the inevitable noisy distant labels tagged unsupervisedly. A few past works approach this problem by adopting a self-training framework with a sample-selection mechanism. In this work, we innovatively identify two types of biases that were omitted by prior work, and these biases lead to inferior performance of the distant-supervised NER setup. First, we characterize the noise concealed in the distant labels as highly structural rather than fully randomized. Second, the self-training framework would ubiquitously introduce an inherent bias that causes erroneous behavior in both sample selection and eventually prediction.
An Efficient and Robust Framework for Approximate Nearest Neighbor Search with Attribute Constraint
This paper introduces an efficient and robust framework for hybrid query (HQ) processing, which combines approximate nearest neighbor search (ANNS) with attribute constraint. HQ aims to find objects that are similar to a feature vector and match some structured attributes. Existing methods handle ANNS and attribute filtering separately, leading to inefficiency and inaccuracy. Our framework, called native hybrid query (NHQ), builds a composite index based on proximity graph (PG) and applies joint pruning for HQ. We can easily adapt existing PGs to this framework for efficient HQ processing.
Improving Legal Entity Recognition Using a Hybrid Transformer Model and Semantic Filtering Approach
Legal Entity Recognition (LER) involves identifying key entities such as parties, dates, monetary amounts, and legal provisions from legal documents. Automating this process is crucial for improving efficiency in legal workflows, including contract review, compliance monitoring, and litigation support. Traditional Named Entity Recognition (NER) methods, such as rule-based systems and classical machine learning models like Conditional Random Fields (CRFs), require extensive feature engineering and struggle to adapt to new legal terminologies. Transformer-based models, particularly BERT [1], have shown great promise in various NLP tasks, including LER. **Legal-BERT**, a finetuned variant of BERT for legal texts, has demonstrated superior performance
Towards Characterizing the First-order Query Complexity of Learning (Approximate) Nash Equilibria in Zero-sum Matrix Games
In the first-order query model for zero-sum K\times K matrix games, players observe the expected pay-offs for all their possible actions under the randomized action played by their opponent. Surprisingly, the optimal number of such queries, as a function of both \epsilon and K, is not known. We make progress on this question on two fronts. First, we fully characterise the query complexity of learning exact equilibria ( \epsilon 0), by showing that they require a number of queries that is linear in K, which means that it is essentially as hard as querying the whole matrix, which can also be done with K queries. We argue that, unfortunately, obtaining a matching lower bound is not possible with existing techniques: we prove that no lower bound can be derived by constructing hard matrices whose entries take values in a known countable set, because such matrices can be fully identified by a single query.
Overlapping Spaces for Compact Graph Representations
Various non-trivial spaces are becoming popular for embedding structured data such as graphs, texts, or images. Following spherical and hyperbolic spaces, more general product spaces have been proposed. However, searching for the best configuration of a product space is a resource-intensive procedure, which reduces the practical applicability of the idea. We generalize the concept of product space and introduce an overlapping space that does not have the configuration search problem. The main idea is to allow subsets of coordinates to be shared between spaces of different types (Euclidean, hyperbolic, spherical).