AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Proceedings of the 6th International Workshop on Reading Music Systems

Calvo-Zaragoza, Jorge, Pacha, Alexander, Shatri, Elona

arXiv.org Artificial IntelligenceNov-24-2024

The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 6th International Workshop on Reading Music Systems, held Online on November 22nd 2024.

large language model, machine learning, pattern recognition, (22 more...)

arXiv.org Artificial Intelligence

2411.15741

Country:

North America > Canada > Quebec > Montreal (0.14)
Europe > United Kingdom > England > Greater London > London (0.14)
Europe > France > Île-de-France > Paris > Paris (0.04)
(25 more...)

Genre:

Research Report > New Finding (1.00)
Workflow (0.93)
Research Report > Promising Solution (0.92)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Education > Curriculum > Subject-Specific Education (0.65)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Information Management > Search (1.00)
Information Technology > Human Computer Interaction (1.00)
(11 more...)

Add feedback

QEQR: An Exploration of Query Expansion Methods for Question Retrieval in CQA Services

Ghafourian, Yasin, Movahedi, Sajad, Shakery, Azadeh

arXiv.org Artificial IntelligenceNov-23-2024

CQA services are valuable sources of knowledge that can be used to find answers to users' information needs. In these services, question retrieval aims to help users with their information needs by finding similar questions to theirs. However, finding similar questions is obstructed by the lexical gap that exists between relevant questions. In this work, we target this problem by using query expansion methods. We use word-similarity-based methods, propose a question-similarity-based method and selective expansion of these methods to expand a question that's been submitted and mitigate the lexical gap problem. Our best method achieves a significant relative improvement of 1.8\% compared to the best-performing baseline without query expansion.

artificial intelligence, natural language, query expansion method, (5 more...)

arXiv.org Artificial Intelligence

2411.1553

Genre: Research Report (0.40)

Technology:

Information Technology > Databases (0.80)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.80)

Add feedback

ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos

Hannan, Tanveer, Islam, Md Mohaiminul, Gu, Jindong, Seidl, Thomas, Bertasius, Gedas

arXiv.org Artificial IntelligenceNov-22-2024

Large language models (LLMs) excel at retrieving information from lengthy text, but their vision-language counterparts (VLMs) face difficulties with hour-long videos, especially for temporal grounding. Specifically, these VLMs are constrained by frame limitations, often losing essential temporal details needed for accurate event localization in extended video content. We propose ReVisionLLM, a recursive vision-language model designed to locate events in hour-long videos. Inspired by human search strategies, our model initially targets broad segments of interest, progressively revising its focus to pinpoint exact temporal boundaries. Our model can seamlessly handle videos of vastly different lengths, from minutes to hours. We also introduce a hierarchical training strategy that starts with short clips to capture distinct events and progressively extends to longer videos. To our knowledge, ReVisionLLM is the first VLM capable of temporal grounding in hour-long videos, outperforming previous state-of-the-art methods across multiple datasets by a significant margin (+2.6% R1@0.1 on MAD). The code is available at https://github.com/Tanveer81/ReVisionLLM.

information retrieval, large language model, natural language, (5 more...)

arXiv.org Artificial Intelligence

2411.14901

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.53)

Add feedback

Neuro-Symbolic Query Optimization in Knowledge Graphs

Acosta, Maribel, Qin, Chang, Schwabe, Tim

arXiv.org Artificial IntelligenceNov-21-2024

This chapter delves into the emerging field of neuro-symbolic query optimization for knowledge graphs (KGs), presenting a comprehensive exploration of how neural and symbolic techniques can be integrated to enhance query processing. Traditional query optimizers in knowledge graphs rely heavily on symbolic methods, utilizing dataset summaries, statistics, and cost models to select efficient execution plans. However, these approaches often suffer from misestimations and inaccuracies, particularly when dealing with complex queries or large-scale datasets. Recent advancements have introduced neural models, which capture non-linear aspects of query optimization, offering promising alternatives to purely symbolic methods. In this chapter, we introduce neuro-symbolic query optimizers, a novel approach that combines the strengths of symbolic reasoning with the adaptability of neural computation. We discuss the architecture of these hybrid systems, highlighting the interplay between neural and symbolic components to improve the optimizer's ability to navigate the search space and produce efficient execution plans. Additionally, the chapter reviews existing neural components tailored for optimizing queries over knowledge graphs and examines the limitations and challenges in deploying neuro-symbolic query optimizers in real-world environments.

optimizer, query, query optimization, (14 more...)

arXiv.org Artificial Intelligence

2411.14277

Country:

North America > United States > Texas > Dallas County > Dallas (0.04)
North America > United States > Pennsylvania > Northampton County > Bethlehem (0.04)
Europe > Greece > Attica > Athens (0.04)
(3 more...)

Genre:

Research Report (0.70)
Workflow (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)

Add feedback

Dynamic faceted search: from haystack to highlight

AIHubNov-20-2024, 09:42:45 GMT

In the digital age, the amount of scholarly articles is growing exponentially. In the Open Research Knowledge Graph's question-answering facility ASK, for example, more than 80 million research articles have already been indexed. Finding the most relevant information from vast collections of scholarly data can be daunting for researchers, students, and academics. To tackle this challenge, search engines and digital libraries often rely on advanced search techniques, one of the most effective being faceted search. Faceted search is an advanced search method that allows users to filter and refine search results based on multiple predefined attributes, known as facets.

dataset, dynamic facet generation, dynamic faceted search, (10 more...)

AIHub

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.80)

Add feedback

Learning to Ask: Conversational Product Search via Representation Learning

Zou, Jie, Huang, Jimmy Xiangji, Ren, Zhaochun, Kanoulas, Evangelos

arXiv.org Artificial IntelligenceNov-18-2024

Online shopping platforms, such as Amazon and AliExpress, are increasingly prevalent in society, helping customers purchase products conveniently. With recent progress in natural language processing, researchers and practitioners shift their focus from traditional product search to conversational product search. Conversational product search enables user-machine conversations and through them collects explicit user feedback that allows to actively clarify the users' product preferences. Therefore, prospective research on an intelligent shopping assistant via conversations is indispensable. Existing publications on conversational product search either model conversations independently from users, queries, and products or lead to a vocabulary mismatch. In this work, we propose a new conversational product search model, ConvPS, to assist users in locating desirable items. The model is first trained to jointly learn the semantic representations of user, query, item, and conversation via a unified generative framework. After learning these representations, they are integrated to retrieve the target items in the latent semantic space. Meanwhile, we propose a set of greedy and explore-exploit strategies to learn to ask the user a sequence of high-performance questions for conversations. Our proposed ConvPS model can naturally integrate the representation learning of the user, query, item, and conversation into a unified generative framework, which provides a promising avenue for constructing accurate and robust conversational product search systems that are flexible and adaptive. Experimental results demonstrate that our ConvPS model significantly outperforms state-of-the-art baselines.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3555371

2411.14466

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Oceania > Australia (0.04)
(9 more...)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.48)

Industry: Information Technology > Services > e-Commerce Services (0.34)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(4 more...)

Add feedback

Drowning in Documents: Consequences of Scaling Reranker Inference

Jacob, Mathew, Lindgren, Erik, Zaharia, Matei, Carbin, Michael, Khattab, Omar, Drozdov, Andrew

arXiv.org Artificial IntelligenceNov-18-2024

Rerankers, typically cross-encoders, are often used to re-score the documents retrieved by cheaper initial IR systems. This is because, though expensive, rerankers are assumed to be more effective. We challenge this assumption by measuring reranker performance for full retrieval, not just re-scoring first-stage retrieval. Our experiments reveal a surprising trend: the best existing rerankers provide diminishing returns when scoring progressively more documents and actually degrade quality beyond a certain limit. In fact, in this setting, rerankers can frequently assign high scores to documents with no lexical or semantic overlap with the query. We hope that our findings will spur future research to improve reranking.

large language model, machine learning, reranker, (19 more...)

arXiv.org Artificial Intelligence

2411.11767

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Finland > Southwest Finland > Turku (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Research Report > Strength High (0.93)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(2 more...)

Add feedback

OKG: On-the-Fly Keyword Generation in Sponsored Search Advertising

Wang, Zhao, Gangopadhyay, Briti, Zhao, Mengjie, Takamatsu, Shingo

arXiv.org Artificial IntelligenceNov-17-2024

Current keyword decision-making in sponsored search advertising relies on large, static datasets, limiting the ability to automatically set up keywords and adapt to real-time KPI metrics and product updates that are essential for effective advertising. In this paper, we propose On-the-fly Keyword Generation (OKG), an LLM agent-based method that dynamically monitors KPI changes and adapts keyword generation in real time, aligning with strategies recommended by advertising platforms. Additionally, we introduce the first publicly accessible dataset containing real keyword data along with its KPIs across diverse domains, providing a valuable resource for future research. Experimental results show that OKG significantly improves keyword adaptability and responsiveness compared to traditional methods. The code for OKG and the dataset are available at https://github.com/sony/okg.

large language model, machine learning, real time system, (22 more...)

arXiv.org Artificial Intelligence

2412.03577

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Pennsylvania (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Marketing (1.00)
Information Technology > Services (1.00)
Banking & Finance > Insurance (1.00)
Health & Medicine > Consumer Health (0.68)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(3 more...)

Add feedback

KyrgyzNLP: Challenges, Progress, and Future

Alekseev, Anton, Turatali, Timur

arXiv.org Artificial IntelligenceNov-15-2024

Large language models (LLMs) have excelled in numerous benchmarks, advancing AI applications in both linguistic and non-linguistic tasks. However, this has primarily benefited well-resourced languages, leaving less-resourced ones (LRLs) at a disadvantage. In this paper, we highlight the current state of the NLP field in the specific LRL: kyrgyz tili. Human evaluation, including annotated datasets created by native speakers, remains an irreplaceable component of reliable NLP performance, especially for LRLs where automatic evaluations can fall short. In recent assessments of the resources for Turkic languages, Kyrgyz is labeled with the status 'Scraping By', a severely under-resourced language spoken by millions. This is concerning given the growing importance of the language, not only in Kyrgyzstan but also among diaspora communities where it holds no official status. We review prior efforts in the field, noting that many of the publicly available resources have only recently been developed, with few exceptions beyond dictionaries (the processed data used for the analysis is presented at https://kyrgyznlp.github.io/). While recent papers have made some headway, much more remains to be done. Despite interest and support from both business and government sectors in the Kyrgyz Republic, the situation for Kyrgyz language resources remains challenging. We stress the importance of community-driven efforts to build these resources, ensuring the future advancement sustainability. We then share our view of the most pressing challenges in Kyrgyz NLP. Finally, we propose a roadmap for future development in terms of research topics and language resources.

kyrgyz nlp, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2411.05503

Country:

Asia > Russia (0.14)
Europe > Germany > Saxony > Leipzig (0.05)
Asia > Kyrgyzstan > Chüy Region > Bishkek (0.04)
(19 more...)

Genre:

Research Report (1.00)
Overview > Growing Problem (0.34)

Industry:

Government (1.00)
Media > News (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(4 more...)

Add feedback

Increasing the Accessibility of Causal Domain Knowledge via Causal Information Extraction Methods: A Case Study in the Semiconductor Manufacturing Industry

Razouk, Houssam, Benischke, Leonie, Garber, Daniel, Kern, Roman

arXiv.org Artificial IntelligenceNov-15-2024

The extraction of causal information from textual data is crucial in the industry for identifying and mitigating potential failures, enhancing process efficiency, prompting quality improvements, and addressing various operational challenges. This paper presents a study on the development of automated methods for causal information extraction from actual industrial documents in the semiconductor manufacturing industry. The study proposes two types of causal information extraction methods, single-stage sequence tagging (SST) and multi-stage sequence tagging (MST), and evaluates their performance using existing documents from a semiconductor manufacturing company, including presentation slides and FMEA (Failure Mode and Effects Analysis) documents. The study also investigates the effect of representation learning on downstream tasks. The presented case study showcases that the proposed MST methods for extracting causal information from industrial documents are suitable for practical applications, especially for semi structured documents such as FMEAs, with a 93\% F1 score. Additionally, MST achieves a 73\% F1 score on texts extracted from presentation slides. Finally, the study highlights the importance of choosing a language model that is more aligned with the domain and in-domain fine-tuning.

data mining, machine learning, relation, (20 more...)

arXiv.org Artificial Intelligence

2411.10172

Country:

Europe > Austria > Styria > Graz (0.04)
North America > Puerto Rico > Peñuelas > Peñuelas (0.04)
Europe > Greece (0.04)
(10 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Semiconductors & Electronics (1.00)
Information Technology > Security & Privacy (0.93)
Information Technology > Hardware (0.91)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Data Science > Data Mining > Text Mining (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback