AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data

Wang, Hsuan-Fu, Shih, Yi-Jen, Chang, Heng-Jui, Berry, Layne, Peng, Puyuan, Lee, Hung-yi, Wang, Hsin-Min, Harwath, David

arXiv.org Artificial IntelligenceFeb-10-2024

The recently proposed visually grounded speech model SpeechCLIP is an innovative framework that bridges speech and text through images via CLIP without relying on text transcription. On this basis, this paper introduces two extensions to SpeechCLIP. First, we apply the Continuous Integrate-and-Fire (CIF) module to replace a fixed number of CLS tokens in the cascaded architecture. Second, we propose a new hybrid architecture that merges the cascaded and parallel architectures of SpeechCLIP into a multi-task learning framework. Our experimental evaluation is performed on the Flickr8k and SpokenCOCO datasets. The results show that in the speech keyword extraction task, the CIF-based cascaded SpeechCLIP model outperforms the previous cascaded SpeechCLIP model using a fixed number of CLS tokens. Furthermore, through our hybrid architecture, cascaded task learning boosts the performance of the parallel branch in image-speech retrieval tasks.

architecture, representation, speechclip, (16 more...)

arXiv.org Artificial Intelligence

2402.06959

Country:

Asia > Taiwan (0.05)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Massachusetts (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.49)

Add feedback

TL;DR Progress: Multi-faceted Literature Exploration in Text Summarization

Syed, Shahbaz, Al-Khatib, Khalid, Potthast, Martin

arXiv.org Artificial IntelligenceFeb-10-2024

This paper presents TL;DR Progress, a new tool for exploring the literature on neural text summarization. It organizes 514~papers based on a comprehensive annotation scheme for text summarization approaches and enables fine-grained, faceted search. Each paper was manually annotated to capture aspects such as evaluation metrics, quality dimensions, learning paradigms, challenges addressed, datasets, and document domains. In addition, a succinct indicative summary is provided for each paper, consisting of automatically extracted contextual factors, issues, and proposed solutions. The tool is available online at https://www.tldr-progress.de, a demo video at https://youtu.be/uCVRGFvXUj8

computational linguistic, information, summarization, (16 more...)

arXiv.org Artificial Intelligence

2402.06913

Country:

North America > Canada > Ontario > Toronto (0.05)
Europe > Portugal > Lisbon > Lisbon (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(8 more...)

Genre:

Research Report (1.00)
Overview (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.47)

Add feedback

Verif.ai: Towards an Open-Source Scientific Generative Question-Answering System with Referenced and Verifiable Answers

Košprdić, Miloš, Ljajić, Adela, Bašaragin, Bojana, Medvecki, Darija, Milošević, Nikola

arXiv.org Artificial IntelligenceFeb-9-2024

In this paper, we present the current progress of the project Verif.ai, an open-source scientific generative question-answering system with referenced and verified answers. The components of the system are (1) an information retrieval system combining semantic and lexical search techniques over scientific papers (PubMed), (2) a fine-tuned generative model (Mistral 7B) taking top answers and generating answers with references to the papers from which the claim was derived, and (3) a verification engine that cross-checks the generated claim and the abstract or paper from which the claim was derived, verifying whether there may have been any hallucinations in generating the claim. We are reinforcing the generative model by providing the abstract in context, but in addition, an independent set of methods and models are verifying the answer and checking for hallucinations. Therefore, we believe that by using our method, we can make scientists more productive, while building trust in the use of generative language models in scientific environments, where hallucinations and misinformation cannot be tolerated.

arxiv preprint arxiv, hallucination, mistral-7b-instruct-v0, (12 more...)

arXiv.org Artificial Intelligence

2402.18589

Country:

Europe > Serbia > Vojvodina > South Bačka District > Novi Sad (0.05)
Europe > Germany > Berlin (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.65)

Industry: Media > News (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

Chen, Zhuo, Zhang, Yichi, Fang, Yin, Geng, Yuxia, Guo, Lingbing, Chen, Xiang, Li, Qian, Zhang, Wen, Chen, Jiaoyan, Zhu, Yushan, Li, Jiaqi, Liu, Xiaoze, Pan, Jeff Z., Zhang, Ningyu, Chen, Huajun

arXiv.org Artificial IntelligenceFeb-9-2024

Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the semantic web community's exploration into multi-modal dimensions unlocking new avenues for innovation. In this survey, we carefully review over 300 articles, focusing on KG-aware research in two principal aspects: KG-driven Multi-Modal (KG4MM) learning, where KGs support multi-modal tasks, and Multi-Modal Knowledge Graph (MM4KG), which extends KG studies into the MMKG realm. We begin by defining KGs and MMKGs, then explore their construction progress. Our review includes two primary task categories: KG-aware multi-modal learning tasks, such as Image Classification and Visual Question Answering, and intrinsic MMKG tasks like Multi-modal Knowledge Graph Completion and Entity Alignment, highlighting specific research trajectories. For most of these tasks, we provide definitions, evaluation benchmarks, and additionally outline essential insights for conducting relevant research. Finally, we discuss current challenges and identify emerging trends, such as progress in Large Language Modeling and Multi-modal Pre-training strategies. This survey aims to serve as a comprehensive reference for researchers already involved in or considering delving into KG and multi-modal learning research, offering insights into the evolving landscape of MMKG research and supporting future work.

multi-modal data, multi-modal learning, relation extraction, (16 more...)

arXiv.org Artificial Intelligence

2402.05391

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.13)
Europe > France > Bourgogne-Franche-Comté > Doubs > Besançon (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)
(8 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Leisure & Entertainment > Sports (1.00)
Information Technology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(2 more...)

Technology:

Information Technology > Communications > Web > Semantic Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
(10 more...)

Add feedback

Robust Knowledge Extraction from Large Language Models using Social Choice Theory

Potyka, Nico, Zhu, Yuqicheng, He, Yunjie, Kharlamov, Evgeny, Staab, Steffen

arXiv.org Artificial IntelligenceFeb-8-2024

Large-language models (LLMs) can support a wide range of applications like conversational agents, creative writing or general query answering. However, they are ill-suited for query answering in high-stake domains like medicine because they are typically not robust - even the same query can result in different answers when prompted multiple times. In order to improve the robustness of LLM queries, we propose using ranking queries repeatedly and to aggregate the queries using methods from social choice theory. We study ranking queries in diagnostic settings like medical and fault diagnosis and discuss how the Partial Borda Choice function from the literature can be applied to merge multiple query results. We discuss some additional interesting properties in our setting and evaluate the robustness of our approach empirically.

query, robustness, symptom, (15 more...)

arXiv.org Artificial Intelligence

2312.14877

Country:

Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
Europe > United Kingdom > England > Hampshire > Southampton (0.04)
(3 more...)

Genre:

Financial News (0.68)
Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Government (0.93)
Health & Medicine > Therapeutic Area > Immunology (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.34)

Add feedback

Navigating the Knowledge Sea: Planet-scale answer retrieval using LLMs

Sarkar, Dipankar

arXiv.org Artificial IntelligenceFeb-7-2024

Information retrieval is a rapidly evolving field of information retrieval, which is characterized by a continuous refinement of techniques and technologies, from basic hyperlink-based navigation to sophisticated algorithm-driven search engines. This paper aims to provide a comprehensive overview of the evolution of Information Retrieval Technology, with a particular focus on the role of Large Language Models (LLMs) in bridging the gap between traditional search methods and the emerging paradigm of answer retrieval. The integration of LLMs in the realms of response retrieval and indexing signifies a paradigm shift in how users interact with information systems. This paradigm shift is driven by the integration of large language models (LLMs) like GPT-4, which are capable of understanding and generating human-like text, thus enabling them to provide more direct and contextually relevant answers to user queries. Through this exploration, we seek to illuminate the technological milestones that have shaped this journey and the potential future directions in this rapidly changing field.

information retrieval, llm, retrieval, (14 more...)

arXiv.org Artificial Intelligence

2402.05318

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report (0.82)
Overview (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Theoretical and Empirical Analysis of Adaptive Entry Point Selection for Graph-based Approximate Nearest Neighbor Search

Oguri, Yutaro, Matsui, Yusuke

arXiv.org Artificial IntelligenceFeb-7-2024

We present a theoretical and empirical analysis of the adaptive entry point selection for graph-based approximate nearest neighbor search (ANNS). We introduce novel concepts: $b\textit{-monotonic path}$ and $B\textit{-MSNET}$, which better capture an actual graph in practical algorithms than existing concepts like MSNET. We prove that adaptive entry point selection offers better performance upper bound than the fixed central entry point under more general conditions than previous work. Empirically, we validate the method's effectiveness in accuracy, speed, and memory usage across various datasets, especially in challenging scenarios with out-of-distribution data and hard instances. Our comprehensive study provides deeper insights into optimizing entry points for graph-based ANNS for real-world high-dimensional data applications.

adaptive entry point selection, diskann, theoretical and empirical analysis, (7 more...)

arXiv.org Artificial Intelligence

2402.04713

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Oceania > Australia (0.04)

Genre: Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.64)

Add feedback

Enhancing Retrieval Processes for Language Generation with Augmented Queries

Ghali, Julien Pierre Edmond, Shima, Kosuke, Moriyama, Koichi, Mutoh, Atsuko, Inuzuka, Nobuhiro

arXiv.org Artificial IntelligenceFeb-6-2024

In the rapidly changing world of smart technology, searching for documents has become more challenging due to the rise of advanced language models. These models sometimes face difficulties, like providing inaccurate information, commonly known as "hallucination." This research focuses on addressing this issue through Retrieval-Augmented Generation (RAG), a technique that guides models to give accurate responses based on real facts. To overcome scalability issues, the study explores connecting user queries with sophisticated language models such as BERT and Orca2, using an innovative query optimization process. The study unfolds in three scenarios: first, without RAG, second, without additional assistance, and finally, with extra help. Choosing the compact yet efficient Orca2 7B model demonstrates a smart use of computing resources. The empirical results indicate a significant improvement in the initial language model's performance under RAG, particularly when assisted with prompts augmenters. Consistency in document retrieval across different encodings highlights the effectiveness of using language model-generated queries. The introduction of UMAP for BERT further simplifies document retrieval while maintaining strong results.

hallucination, schizophrenia, symptom, (15 more...)

arXiv.org Artificial Intelligence

2402.16874

Country:

North America > United States (0.14)
Europe > Sweden (0.04)
Asia > Japan (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)
Research Report > Promising Solution (0.46)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Consumer Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.48)

Add feedback

Harnessing PubMed User Query Logs for Post Hoc Explanations of Recommended Similar Articles

Shin, Ashley, Jin, Qiao, Anibal, James, Lu, Zhiyong

arXiv.org Artificial IntelligenceFeb-5-2024

Searching for a related article based on a reference article is an integral part of scientific research. PubMed, like many academic search engines, has a "similar articles" feature that recommends articles relevant to the current article viewed by a user. Explaining recommended items can be of great utility to users, particularly in the literature search process. With more than a million biomedical papers being published each year, explaining the recommended similar articles would facilitate researchers and clinicians in searching for related articles. Nonetheless, the majority of current literature recommendation systems lack explanations for their suggestions. We employ a post hoc approach to explaining recommendations by identifying relevant tokens in the titles of similar articles. Our major contribution is building PubCLogs by repurposing 5.6 million pairs of coclicked articles from PubMed's user query logs. Using our PubCLogs dataset, we train the Highlight Similar Article Title (HSAT), a transformer-based model designed to select the most relevant parts of the title of a similar article, based on the title and abstract of a seed article. HSAT demonstrates strong performance in our empirical evaluations, achieving an F1 score of 91.72 percent on the PubCLogs test set, considerably outperforming several baselines including BM25 (70.62), MPNet (67.11), MedCPT (62.22), GPT-3.5 (46.00), and GPT-4 (64.89). Additional evaluations on a separate, manually annotated test set further verifies HSAT's performance. Moreover, participants of our user study indicate a preference for HSAT, due to its superior balance between conciseness and comprehensiveness. Our study suggests that repurposing user query logs of academic search engines can be a promising way to train state-of-the-art models for explaining literature recommendation.

article title, similar article, similar article title, (15 more...)

arXiv.org Artificial Intelligence

2402.03484

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Maryland > Montgomery County > Bethesda (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.74)
Education > Health & Safety > School Nutrition (0.51)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fair Active Ranking from Pairwise Preferences

Gorantla, Sruthi, Ahmadian, Sara

arXiv.org Artificial IntelligenceFeb-5-2024

We investigate the problem of probably approximately correct and fair (PACF) ranking of items by adaptively evoking pairwise comparisons. Given a set of $n$ items that belong to disjoint groups, our goal is to find an $(\epsilon, \delta)$-PACF-Ranking according to a fair objective function that we propose. We assume access to an oracle, wherein, for each query, the learner can choose a pair of items and receive stochastic winner feedback from the oracle. Our proposed objective function asks to minimize the $\ell_q$ norm of the error of the groups, where the error of a group is the $\ell_p$ norm of the error of all the items within that group, for $p, q \geq 1$. This generalizes the objective function of $\epsilon$-Best-Ranking, proposed by Saha & Gopalan (2019). By adopting our objective function, we gain the flexibility to explore fundamental fairness concepts like equal or proportionate errors within a unified framework. Adjusting parameters $p$ and $q$ allows tailoring to specific fairness preferences. We present both group-blind and group-aware algorithms and analyze their sample complexity. We provide matching lower bounds up to certain logarithmic factors for group-blind algorithms. For a restricted class of group-aware algorithms, we show that we can get reasonable lower bounds. We conduct comprehensive experiments on both real-world and synthetic datasets to complement our theoretical findings.

algorithm, ranking, sample complexity, (14 more...)

arXiv.org Artificial Intelligence

2402.03252

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > Switzerland (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)

Add feedback