Towards Retrieval Augmented Generation over Large Video Libraries

Tevissen, Yannis, Guetari, Khalil, Petitpont, Frédéric

Jun-21-2024–arXiv.org Artificial Intelligence

Video content creators need efficient tools to repurpose content, a task that often requires complex manual or automated searches. Crafting a new video from large video libraries remains a challenge. In this paper we introduce the task of Video Library Question Answering (VLQA) through an interoperable architecture that applies Retrieval Augmented Generation (RAG) to video libraries. We propose a system that uses large language models (LLMs) to generate search queries, retrieving relevant video moments indexed by speech and visual metadata. An answer generation module then integrates user queries with this metadata to produce responses with specific video timestamps. This approach shows promise in multimedia content retrieval, and AI-assisted video content creation.

architecture, arxiv, video library, (9 more...)

arXiv.org Artificial Intelligence

Jun-21-2024

arXiv.org PDF

Add feedback

Country:
- Asia > Singapore (0.04)
- North America > United States
  - New York > New York County > New York City (0.04)
- Europe > France
  - Île-de-France > Paris > Paris (0.04)

Genre:
- Research Report (0.55)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.87)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found