AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Understanding the User: An Intent-Based Ranking Dataset

Anand, Abhijit, Leonhardt, Jurek, Venktesh, V, Anand, Avishek

arXiv.org Artificial IntelligenceAug-30-2024

As information retrieval systems continue to evolve, accurate evaluation and benchmarking of these systems become pivotal. Web search datasets, such as MS MARCO, primarily provide short keyword queries without accompanying intent or descriptions, posing a challenge in comprehending the underlying information need. This paper proposes an approach to augmenting such datasets to annotate informative query descriptions, with a focus on two prominent benchmark datasets: TREC-DL-21 and TREC-DL-22. Our methodology involves utilizing state-of-the-art LLMs to analyze and comprehend the implicit intent within individual queries from benchmark datasets. By extracting key semantic elements, we construct detailed and contextually rich descriptions for these queries. To validate the generated query descriptions, we employ crowdsourcing as a reliable means of obtaining diverse human perspectives on the accuracy and informativeness of the descriptions. This information can be used as an evaluation set for tasks such as ranking, query rewriting, or others.

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2408.17103

Country:

Europe > Netherlands > South Holland > Delft (0.05)
North America > United States > New York > New York County > New York City (0.05)
Asia > Thailand > Bangkok > Bangkok (0.05)
(7 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area (0.49)
Education (0.46)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Smart Multi-Modal Search: Contextual Sparse and Dense Embedding Integration in Adobe Express

Aroraa, Cherag, King, Tracy Holloway, Kumar, Jayant, Lu, Yi, Sharma, Sanat, Srikantan, Arvind, Uvalle, David, Valls-Vargas, Josep, Vardhan, Harsha

arXiv.org Artificial IntelligenceAug-29-2024

As user content and queries become increasingly multi-modal, the need for effective multi-modal search systems has grown. Traditional search systems often rely on textual and metadata annotations for indexed images, while multi-modal embeddings like CLIP enable direct search using text and image embeddings. However, embedding-based approaches face challenges in integrating contextual features such as user locale and recency. Building a scalable multi-modal search system requires fine-tuning several components. This paper presents a multi-modal search architecture and a series of AB tests that optimize embeddings and multi-modal technologies in Adobe Express template search. We address considerations such as embedding model selection, the roles of embeddings in matching and ranking, and the balance between dense and sparse embeddings. Our iterative approach demonstrates how utilizing sparse, dense, and contextual features enhances short and long query search, significantly reduces null rates (over 70\%), and increases click-through rates (CTR). Our findings provide insights into developing robust multi-modal search systems, thereby enhancing relevance for complex queries.

adobeclip, query, template, (13 more...)

arXiv.org Artificial Intelligence

2408.14698

Country:

North America > United States (0.28)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)

Genre: Research Report (0.71)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Information Management > Search (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)

Add feedback

CardBench: A Benchmark for Learned Cardinality Estimation in Relational Databases

Chronis, Yannis, Wang, Yawen, Gan, Yu, Abu-El-Haija, Sami, Lin, Chelsea, Binnig, Carsten, Özcan, Fatma

arXiv.org Artificial IntelligenceAug-28-2024

Cardinality estimation is crucial for enabling high query performance in relational databases. Recently learned cardinality estimation models have been proposed to improve accuracy but there is no systematic benchmark or datasets which allows researchers to evaluate the progress made by new learned approaches and even systematically develop new learned approaches. In this paper, we are releasing a benchmark, containing thousands of queries over 20 distinct real-world databases for learned cardinality estimation. In contrast to other initial benchmarks, our benchmark is much more diverse and can be used for training and testing learned models systematically. Using this benchmark, we explored whether learned cardinality estimation can be transferred to an unseen dataset in a zero-shot manner. We trained GNN-based and transformer-based models to study the problem in three setups: 1-) instance-based, 2-) zero-shot, and 3-) fine-tuned. Our results show that while we get promising results for zero-shot cardinality estimation on simple single table queries; as soon as we add joins, the accuracy drops. However, we show that with fine-tuning, we can still utilize pre-trained models for cardinality estimation, significantly reducing training overheads compared to instance specific models. We are open sourcing our scripts to collect statistics, generate queries and training datasets to foster more extensive research, also from the ML community on the important problem of cardinality estimation and in particular improve on recent directions such as pre-trained cardinality estimation.

benchmark, dataset, query, (16 more...)

arXiv.org Artificial Intelligence

2408.1617

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Texas > Dallas County > Dallas (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.48)

Add feedback

Knowledge Navigator: LLM-guided Browsing Framework for Exploratory Search in Scientific Literature

Katz, Uri, Levy, Mosh, Goldberg, Yoav

arXiv.org Artificial IntelligenceAug-28-2024

The exponential growth of scientific literature necessitates advanced tools for effective knowledge exploration. We present Knowledge Navigator, a system designed to enhance exploratory search abilities by organizing and structuring the retrieved documents from broad topical queries into a navigable, two-level hierarchy of named and descriptive scientific topics and subtopics. This structured organization provides an overall view of the research themes in a domain, while also enabling iterative search and deeper knowledge discovery within specific subtopics by allowing users to refine their focus and retrieve additional relevant documents. Knowledge Navigator combines LLM capabilities with cluster-based methods to enable an effective browsing method. We demonstrate our approach's effectiveness through automatic and manual evaluations on two novel benchmarks, CLUSTREC-COVID and SCITOC. Our code, prompts, and benchmarks are made publicly available.

knowledge navigator, query, tool use, (15 more...)

arXiv.org Artificial Intelligence

2408.15836

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > Germany > Bavaria > Regensburg (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (0.70)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations

Jiang, Yucheng, Shao, Yijia, Ma, Dekun, Semnani, Sina J., Lam, Monica S.

arXiv.org Artificial IntelligenceAug-27-2024

While language model (LM)-powered chatbots and generative search engines excel at answering concrete queries, discovering information in the terrain of unknown unknowns remains challenging for users. To emulate the common educational scenario where children/students learn by listening to and participating in conversations of their parents/teachers, we create Collaborative STORM (Co-STORM). Unlike QA systems that require users to ask all the questions, Co-STORM lets users observe and occasionally steer the discourse among several LM agents. The agents ask questions on the user's behalf, allowing the user to discover unknown unknowns serendipitously. To facilitate user interaction, Co-STORM assists users in tracking the discourse by organizing the uncovered information into a dynamic mind map, ultimately generating a comprehensive report as takeaways. For automatic evaluation, we construct the WildSeek dataset by collecting real information-seeking records with user goals. Co-STORM outperforms baseline methods on both discourse trace and report quality. In a further human evaluation, 70% of participants prefer Co-STORM over a search engine, and 78% favor it over a RAG chatbot.

alphafold 3, information, interaction, (15 more...)

arXiv.org Artificial Intelligence

2408.15232

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Singapore (0.04)
(14 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(3 more...)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

MODOC: A Modular Interface for Flexible Interlinking of Text Retrieval and Text Generation Functions

Gao, Yingqiang, Prada, Jhony, Gu, Nianlong, Lam, Jessica, Hahnloser, Richard H. R.

arXiv.org Artificial IntelligenceAug-26-2024

Large Language Models (LLMs) produce eloquent texts but often the content they generate needs to be verified. Traditional information retrieval systems can assist with this task, but most systems have not been designed with LLM-generated queries in mind. As such, there is a compelling need for integrated systems that provide both retrieval and generation functionality within a single user interface. We present MODOC, a modular user interface that leverages the capabilities of LLMs and provides assistance with detecting their confabulations, promoting integrity in scientific writing. MODOC represents a significant step forward in scientific writing assistance. Its modular architecture supports flexible functions for retrieving information and for writing and generating text in a single, user-friendly interface.

flexible interlinking, modular interface, retrieval and text generation function, (2 more...)

arXiv.org Artificial Intelligence

2408.14623

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.87)

Add feedback

Revisiting the Exit from Nuclear Energy in Germany with NLP

Haunss, Sebastian, Blessing, André

arXiv.org Artificial IntelligenceAug-25-2024

Annotation of political discourse is resource-intensive, but recent developments in NLP promise to automate complex annotation tasks. Fine-tuned transformer-based models outperform human annotators in some annotation tasks, but they require large manually annotated training datasets. In our contribution, we explore to which degree a manually annotated dataset can be automatically replicated with today's NLP methods, using unsupervised machine learning and zero- and few-shot learning.

annotation, category, diskursforschung journal, (15 more...)

arXiv.org Artificial Intelligence

2408.1381

Country:

Europe > Germany > Bremen > Bremen (0.14)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.05)
North America > United States > Washington > King County > Seattle (0.04)
(12 more...)

Genre: Research Report (1.00)

Industry:

Government (1.00)
Energy > Power Industry > Utilities > Nuclear (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)
(2 more...)

Add feedback

QD-VMR: Query Debiasing with Contextual Understanding Enhancement for Video Moment Retrieval

Gao, Chenghua, Li, Min, Liu, Jianshuo, Ren, Junxing, Chen, Lin, Liu, Haoyu, Meng, Bo, Fu, Jitao, Su, Wenwen

arXiv.org Artificial IntelligenceAug-23-2024

Video Moment Retrieval (VMR) aims to retrieve relevant moments of an untrimmed video corresponding to the query. While cross-modal interaction approaches have shown progress in filtering out query-irrelevant information in videos, they assume the precise alignment between the query semantics and the corresponding video moments, potentially overlooking the misunderstanding of the natural language semantics. To address this challenge, we propose a novel model called \textit{QD-VMR}, a query debiasing model with enhanced contextual understanding. Firstly, we leverage a Global Partial Aligner module via video clip and query features alignment and video-query contrastive learning to enhance the cross-modal understanding capabilities of the model. Subsequently, we employ a Query Debiasing Module to obtain debiased query features efficiently, and a Visual Enhancement module to refine the video features related to the query. Finally, we adopt the DETR structure to predict the possible target video moments. Through extensive evaluations of three benchmark datasets, QD-VMR achieves state-of-the-art performance, proving its potential to improve the accuracy of VMR. Further analytical experiments demonstrate the effectiveness of our proposed module. Our code will be released to facilitate future research.

qd-vmr, query, retrieval, (13 more...)

arXiv.org Artificial Intelligence

2408.12981

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.88)

Add feedback

Multi-Faceted Question Complexity Estimation Targeting Topic Domain-Specificity

R, Sujay, Perumal, Suki, Nagraj, Yash, Ghei, Anushka, S, Srinivas K

arXiv.org Artificial IntelligenceAug-23-2024

Question difficulty estimation remains a multifaceted challenge in educational and assessment settings. Traditional approaches often focus on surface-level linguistic features or learner comprehension levels, neglecting the intricate interplay of factors contributing to question complexity. This paper presents a novel framework for domain-specific question difficulty estimation, leveraging a suite of NLP techniques and knowledge graph analysis. We introduce four key parameters: Topic Retrieval Cost, Topic Salience, Topic Coherence, and Topic Superficiality, each capturing a distinct facet of question complexity within a given subject domain. These parameters are operationalized through topic modelling, knowledge graph analysis, and information retrieval techniques. A model trained on these features demonstrates the efficacy of our approach in predicting question difficulty. By operationalizing these parameters, our framework offers a novel approach to question complexity estimation, paving the way for more effective question generation, assessment design, and adaptive learning systems across diverse academic disciplines.

complexity, computer science & information technology, question difficulty, (11 more...)

arXiv.org Artificial Intelligence

doi: 10.5121/csit.2024.141513

2408.1285

Country:

Asia > India > Karnataka (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Iowa (0.04)
(3 more...)

Genre: Research Report (0.84)

Industry:

Education (1.00)
Information Technology (0.94)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(3 more...)

Add feedback

Sick of Google? Try one of these 5 search engines instead

PCWorldAug-21-2024, 13:00:00 GMT

Google is the most popular search engine, but not the only one. Many of the upstarts profile themselves on the promise of better respect for our privacy and one of the best known is called Duckduckgo. It searches in the same way as Google but without spying. Brave is a browser that prioritizes privacy. It has a built-in search engine, but it is also available in other browsers.

artificial intelligence, google, search engine, (2 more...)

PCWorld

Country:

North America > United States (0.09)
Europe > France (0.09)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback