AITopics | relevance level

2509.16717

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

arXiv.org Artificial IntelligenceJul-9-2025

OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens

Liu, Jiacheng, Blanton, Taylor, Elazar, Yanai, Min, Sewon, Chen, YenSung, Chheda-Kothary, Arnavi, Tran, Huy, Bischoff, Byron, Marsh, Eric, Schmitz, Michael, Trier, Cassidy, Sarnat, Aaron, James, Jenna, Borchardt, Jon, Kuehl, Bailey, Cheng, Evie, Farley, Karen, Sreeram, Sruthi, Anderson, Taira, Albright, David, Schoenick, Carissa, Soldaini, Luca, Groeneveld, Dirk, Pang, Rock Yuren, Koh, Pang Wei, Smith, Noah A., Lebrecht, Sophie, Choi, Yejin, Hajishirzi, Hannaneh, Farhadi, Ali, Dodge, Jesse

We present OLMoTrace, the first system that traces the outputs of language models back to their full, multi-trillion-token training data in real time. OLMoTrace finds and shows verbatim matches between segments of language model output and documents in the training text corpora. Powered by an extended version of infini-gram (Liu et al., 2024), our system returns tracing results within a few seconds. OLMoTrace can help users understand the behavior of language models through the lens of their training data. We showcase how it can be used to explore fact checking, hallucination, and the creativity of language models. OLMoTrace is publicly available and fully open-source.

large language model, machine learning, natural language, (21 more...)

2504.07096

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.49)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.46)

Arabzadeh, Negar, Clarke, Charles L. A .

A Human-AI Comparative Analysis of Prompt Sensitivity in LLM-Based Relevance Judgment

arXiv.org Artificial IntelligenceApr-18-2025

Large Language Models (LLMs) are increasingly used to automate relevance judgments for information retrieval (IR) tasks, often demonstrating agreement with human labels that approaches inter-human agreement. To assess the robustness and reliability of LLM-based relevance judgments, we systematically investigate impact of prompt sensitivity on the task. We collected prompts for relevance assessment from 15 human experts and 15 LLMs across three tasks~ -- ~binary, graded, and pairwise~ -- ~yielding 90 prompts in total. After filtering out unusable prompts from three humans and three LLMs, we employed the remaining 72 prompts with three different LLMs as judges to label document/query pairs from two TREC Deep Learning Datasets (2020 and 2021). We compare LLM-generated labels with TREC official human labels using Cohen's $κ$ and pairwise agreement measures. In addition to investigating the impact of prompt variations on agreement with human labels, we compare human- and LLM-generated prompts and analyze differences among different LLMs as judges. We also compare human- and LLM-generated prompts with the standard UMBRELA prompt used for relevance assessment by Bing and TREC 2024 Retrieval Augmented Generation (RAG) Track. To support future research in LLM-based evaluation, we release all data and prompts at https://github.com/Narabzad/prompt-sensitivity-relevance-judgements/.

large language model, machine learning, natural language, (15 more...)

doi: 10.1145/3726302.3730159

2504.12408

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Italy (0.05)
North America > United States > New York > New York County > New York City (0.04)
(8 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Esfandiarpoor, Reza, Zerveas, George, Zhang, Ruochen, Mgonzo, Macton, Eickhoff, Carsten, Bach, Stephen H.

Beyond Contrastive Learning: Synthetic Data Enables List-wise Training with Multiple Levels of Relevance

arXiv.org Artificial IntelligenceMar-29-2025

Recent advancements in large language models (LLMs) have allowed the augmentation of information retrieval (IR) pipelines with synthetic data in various ways. Yet, the main training paradigm remains: contrastive learning with binary relevance labels and the InfoNCE loss, where one positive document is compared against one or more negatives. This objective treats all documents that are not explicitly annotated as relevant on an equally negative footing, regardless of their actual degree of relevance, thus (a) missing subtle nuances that are useful for ranking and (b) being susceptible to annotation noise. To overcome this limitation, in this work we forgo real training documents and annotations altogether and use open-source LLMs to directly generate synthetic documents that answer real user queries according to several different levels of relevance. This fully synthetic ranking context of graduated relevance, together with an appropriate list-wise loss (Wasserstein distance), enables us to train dense retrievers in a way that better captures the ranking task. Experiments on various IR datasets show that our proposed approach outperforms conventional training with InfoNCE by a large margin. Without using any real documents for training, our dense retriever significantly outperforms the same retriever trained through self-supervision. More importantly, it matches the performance of the same retriever trained on real, labeled training documents of the same dataset, while being more robust to distribution shift and clearly outperforming it when evaluated zero-shot on the BEIR dataset collection.

large language model, machine learning, natural language, (20 more...)

2503.23239

Country:

North America > United States > California > Santa Clara County > Santa Clara (0.04)
North America > Dominican Republic (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Education > Health & Safety > School Nutrition (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Machine LearningMar-13-2025

Convolutional Rectangular Attention Module

Nguyen, Hai-Vy, Gamboa, Fabrice, Zhang, Sixin, Chhaibi, Reda, Gratton, Serge, Giaccone, Thierry

In this paper, we introduce a novel spatial attention module, that can be integrated to any convolutional network. This module guides the model to pay attention to the most discriminative part of an image. This enables the model to attain a better performance by an end-to-end training. In standard approaches, a spatial attention map is generated in a position-wise fashion. We observe that this results in very irregular boundaries. This could make it difficult to generalize to new samples. In our method, the attention region is constrained to be rectangular. This rectangle is parametrized by only 5 parameters, allowing for a better stability and generalization to new samples. In our experiments, our method systematically outperforms the position-wise counterpart. Thus, this provides us a novel useful spatial attention mechanism for convolutional models. Besides, our module also provides the interpretability concerning the ``where to look" question, as it helps to know the part of the input on which the model focuses to produce the prediction.

attention map, attention module, information, (15 more...)

2503.10875

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
North America > United States > New York (0.04)
Europe > France > Provence-Alpes-Côte d'Azur (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceJul-9-2024

TVR-Ranking: A Dataset for Ranked Video Moment Retrieval with Imprecise Queries

Liang, Renjie, Li, Li, Zhang, Chongzhi, Wang, Jing, Zhu, Xizhou, Sun, Aixin

In this paper, we propose the task of \textit{Ranked Video Moment Retrieval} (RVMR) to locate a ranked list of matching moments from a collection of videos, through queries in natural language. Although a few related tasks have been proposed and studied by CV, NLP, and IR communities, RVMR is the task that best reflects the practical setting of moment search. To facilitate research in RVMR, we develop the TVR-Ranking dataset, based on the raw videos and existing moment annotations provided in the TVR dataset. Our key contribution is the manual annotation of relevance levels for 94,442 query-moment pairs. We then develop the $NDCG@K, IoU\geq \mu$ evaluation metric for this new task and conduct experiments to evaluate three baseline models. Our experiments show that the new RVMR task brings new challenges to existing models and we believe this new dataset contributes to the research on multi-modality search. The dataset is available at \url{https://github.com/Ranking-VMR/TVR-Ranking}

annotation, dataset, query, (16 more...)

2407.06597

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Singapore (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)

Yang, Eugene, Jänich, Thomas, Mayfield, James, Lawrie, Dawn

Language Fairness in Multilingual Information Retrieval

arXiv.org Artificial IntelligenceMay-1-2024

Multilingual information retrieval (MLIR) considers the problem of ranking documents in several languages for a query expressed in a language that may differ from any of those languages. Recent work has observed that approaches such as combining ranked lists representing a single document language each or using multilingual pretrained language models demonstrate a preference for one language over others. This results in systematic unfair treatment of documents in different languages. This work proposes a language fairness metric to evaluate whether documents across different languages are fairly ranked through statistical equivalence testing using the Kruskal-Wallis test. In contrast to most prior work in group fairness, we do not consider any language to be an unprotected group. Thus our proposed measure, PEER (Probability of EqualExpected Rank), is the first fairness metric specifically designed to capture the language fairness of MLIR systems. We demonstrate the behavior of PEER on artificial ranked lists. We also evaluate real MLIR systems on two publicly available benchmarks and show that the PEER scores align with prior analytical findings on MLIR fairness. Our implementation is compatible with ir-measures and is available at http://github.com/hltcoe/peer_measure.

fairness, proceedings, relevance level, (14 more...)

doi: 10.1145/3626772.3657943

2405.00978

Country:

North America > United States > District of Columbia > Washington (0.05)
North America > United States > New York > New York County > New York City (0.05)
North America > United States > Maryland > Baltimore (0.05)
(5 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.72)

arXiv.org Machine LearningApr-26-2024

Conformal Ranked Retrieval

Xu, Yunpeng, Guo, Wenge, Wei, Zhi

Ranked retrieval refers to the process of retrieving and ranking documents from a document repository based on their relevance to a user's query. As the core component in Information Retrieval (IR) systems, its goal is to present the most relevant documents at the top of the search results list, making it easier for users to find the information they seek (Baeza-Yates and Ribeiro-Neto, 1999). Over the years, ranked retrieval techniques have been successfully applied to many real-life problems, including web search engines, recommendation systems, and question-and-answer platforms, significantly impacting our daily lives. While ranked retrieval algorithms have been extensively studied in both academia and industry, considering the uncertainty in their predictions is a relatively new challenge. As we increasingly rely on search engines for answers to a wide variety of questions, it becomes crucial to evaluate the reliability of these retrieved answers. Therefore, it is important to quantify the uncertainty of the results, determining whether they encompass all the desired documents and whether these documents are ranked in a reasonable order. The challenges, however, lie in measuring uncertainty for ranked retrieval algorithms and developing methodologies to control this uncertainty. This is particularly challenging due to the complexity of ranked retrieval systems, which typically consist of multiple stages, each with different optimization goals.

dataset, prediction, prediction size, (13 more...)

2404.17769

Country:

North America > United States > New Jersey (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Chaudhuri, Sougata, Tewari, Ambuj

Perceptron like Algorithms for Online Learning to Rank

arXiv.org Machine LearningAug-23-2016

Perceptron is a classic online algorithm for learning a classification function. In this paper, we provide a novel extension of the perceptron algorithm to the learning to rank problem in information retrieval. We consider popular listwise performance measures such as Normalized Discounted Cumulative Gain (NDCG) and Average Precision (AP). A modern perspective on perceptron for classification is that it is simply an instance of online gradient descent (OGD), during mistake rounds, using the hinge loss function. Motivated by this interpretation, we propose a novel family of listwise, large margin ranking surrogates. Members of this family can be thought of as analogs of the hinge loss. Exploiting a certain self-bounding property of the proposed family, we provide a guarantee on the cumulative NDCG (or AP) induced loss incurred by our perceptron-like algorithm. We show that, if there exists a perfect oracle ranker which can correctly rank each instance in an online sequence of ranking data, with some margin, the cumulative loss of perceptron algorithm on that sequence is bounded by a constant, irrespective of the length of the sequence. This result is reminiscent of Novikoff's convergence theorem for the classification perceptron. Moreover, we prove a lower bound on the cumulative loss achievable by any deterministic algorithm, under the assumption of existence of perfect oracle ranker. The lower bound shows that our perceptron bound is not tight, and we propose another, \emph{purely online}, algorithm which achieves the lower bound. We provide empirical results on simulated and large commercial datasets to corroborate our theoretical results.

algorithm, artificial intelligence, machine learning, (17 more...)

1508.00842

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > New York (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.82)

Industry: Education > Educational Setting > Online (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)

Chaudhuri, Sougata, Tewari, Ambuj

Perceptron-like Algorithms and Generalization Bounds for Learning to Rank

arXiv.org Machine LearningMay-3-2014

Learning to rank is a supervised learning problem where the output space is the space of rankings but the supervision space is the space of relevance scores. We make theoretical contributions to the learning to rank problem both in the online and batch settings. First, we propose a perceptron-like algorithm for learning a ranking function in an online setting. Our algorithm is an extension of the classic perceptron algorithm for the classification problem. Second, in the setting of batch learning, we introduce a sufficient condition for convex ranking surrogates to ensure a generalization bound that is independent of number of objects per query. Our bound holds when linear ranking functions are used: a common practice in many learning to rank algorithms. En route to developing the online algorithm and generalization bound, we propose a novel family of listwise large margin ranking surrogates. Our novel surrogate family is obtained by modifying a well-known pairwise large margin ranking surrogate and is distinct from the listwise large margin surrogates developed using the structured prediction framework. Using the proposed family, we provide a guaranteed upper bound on the cumulative NDCG (or MAP) induced loss under the perceptron-like algorithm. We also show that the novel surrogates satisfy the generalization bound condition.

artificial intelligence, machine learning, vector, (19 more...)

1405.0591

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)