AITopics | text retriever

Collaborating Authors

text retriever

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Speech Retrieval-Augmented Generation without Automatic Speech Recognition

Min, Do June, Mundnich, Karel, Lapastora, Andy, Soltanmohammadi, Erfan, Ronanki, Srikanth, Han, Kyu

arXiv.org Artificial IntelligenceJan-3-2025

One common approach for question answering over speech data is to first transcribe speech using automatic speech recognition (ASR) and then employ text-based retrieval-augmented generation (RAG) on the transcriptions. While this cascaded pipeline has proven effective in many practical settings, ASR errors can propagate to the retrieval and generation steps. To overcome this limitation, we introduce SpeechRAG, a novel framework designed for open-question answering over spoken data. Our proposed approach fine-tunes a pre-trained speech encoder into a speech adapter fed into a frozen large language model (LLM)--based retrieval model. By aligning the embedding spaces of text and speech, our speech retriever directly retrieves audio passages from text-based queries, leveraging the retrieval capacity of the frozen text retriever. Our retrieval experiments on spoken question answering datasets show that direct speech retrieval does not degrade over the text-based baseline, and outperforms the cascaded systems using ASR. For generation, we use a speech language model (SLM) as a generator, conditioned on audio passages rather than transcripts. Without fine-tuning of the SLM, this approach outperforms cascaded text-based models when there is high WER in the transcripts.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.165

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (0.48)
Media (0.47)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Add feedback

Enhancing Multimodal Query Representation via Visual Dialogues for End-to-End Knowledge Retrieval

Ju, Yeong-Joon, Kim, Ho-Joong, Lee, Seong-Whan

arXiv.org Artificial IntelligenceNov-12-2024

Existing multimodal retrieval systems often rely on disjointed models for image comprehension, such as object detectors and caption generators, leading to cumbersome implementations and training processes. To overcome this limitation, we propose an end-to-end retrieval system, Ret-XKnow, to endow a text retriever with the ability to understand multimodal queries via dynamic modality interaction. Ret-XKnow leverages a partial convolution mechanism to focus on visual information relevant to the given textual query, thereby enhancing multimodal query representations. To effectively learn multimodal interaction, we also introduce the Visual Dialogue-to-Retrieval (ViD2R) dataset automatically constructed from visual dialogue datasets. Our dataset construction process ensures that the dialogues are transformed into suitable information retrieval tasks using a text retriever. We demonstrate that our approach not only significantly improves retrieval performance in zero-shot settings but also achieves substantial improvements in fine-tuning scenarios. Our code is publicly available: https://github.com/yeongjoonJu/Ret_XKnow.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.08334

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > Taiwan > Taiwan Province > Taipei (0.04)
Asia > Indonesia > Bali (0.04)
(8 more...)

Genre: Research Report (0.64)

Industry:

Transportation > Passenger (1.00)
Transportation > Air (1.00)
Aerospace & Defense (1.00)
Leisure & Entertainment > Sports > Tennis (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.70)

Add feedback

XLMRQA: Open-Domain Question Answering on Vietnamese Wikipedia-based Textual Knowledge Source

Van Nguyen, Kiet, Do, Phong Nguyen-Thuan, Nguyen, Nhat Duy, Van Huynh, Tin, Nguyen, Anh Gia-Tuan, Nguyen, Ngan Luu-Thuy

arXiv.org Artificial IntelligenceAug-13-2022

Question answering (QA) is a natural language understanding task within the fields of information retrieval and information extraction that has attracted much attention from the computational linguistics and artificial intelligence research community in recent years because of the strong development of machine reading comprehension-based models. A reader-based QA system is a high-level search engine that can find correct answers to queries or questions in open-domain or domain-specific texts using machine reading comprehension (MRC) techniques. The majority of advancements in data resources and machine-learning approaches in the MRC and QA systems especially are developed significantly in two resource-rich languages such as English and Chinese. A low-resource language like Vietnamese has witnessed a scarcity of research on QA systems. This paper presents XLMRQA, the first Vietnamese QA system using a supervised transformer-based reader on the Wikipedia-based textual knowledge source (using the UIT-ViQuAD corpus), outperforming the two robust QA systems using deep neural network models: DrQA and BERTserini with 24.46% and 6.28%, respectively. From the results obtained on the three systems, we analyze the influence of question types on the performance of the QA systems.

qa system, question type, text retriever, (14 more...)

arXiv.org Artificial Intelligence

2204.07002

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Africa > Uganda (0.05)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(6 more...)

Genre: Research Report (0.64)

Industry: Education (0.57)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback