Navigating Speech Recording Collections with AI-Generated Illustrations

Håland, Sirina, Strøm, Trond Karlsen, Galuščáková, Petra

Jul-8-2025–arXiv.org Artificial Intelligence

Although the amount of available spoken content is steadily increasing, extracting information and knowledge from speech recordings remains challenging. Beyond enhancing traditional information retrieval methods such as speech search and keyword spotting, novel approaches for navigating and searching spoken content need to be explored and developed. In this paper, we propose a novel navigational method for speech archives that leverages recent advances in language and multimodal generative models. We demonstrate our approach with a Web application that organizes data into a structured format using interactive mind maps and image generation tools. The system is implemented using the TED-LIUM~3 dataset, which comprises over 2,000 speech transcripts and audio files of TED Talks. Initial user tests using a System Usability Scale (SUS) questionnaire indicate the application's potential to simplify the exploration of large speech collections.

category, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Jul-8-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Norway (0.16)
- North America > United States (0.14)

Genre:
- Questionnaire & Opinion Survey (0.90)
- Research Report (0.70)

Industry:
- Education (0.56)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Information Retrieval (0.70)
    - Large Language Model (0.47)
  - Machine Learning
    - Neural Networks (0.50)
    - Statistical Learning > Clustering (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found