Topic Identification For Spontaneous Speech: Enriching Audio Features With Embedded Linguistic Information

Porjazovski, Dejan, Grósz, Tamás, Kurimo, Mikko

Jul-21-2023–arXiv.org Artificial Intelligence

Traditional topic identification solutions from audio rely on an automatic speech recognition system (ASR) to produce transcripts used as input to a text-based model. These approaches work well in high-resource scenarios, where there are sufficient data to train both components of the pipeline. However, in low-resource situations, the ASR system, even if available, produces low-quality transcripts, leading to a bad text-based classifier. Moreover, spontaneous speech containing hesitations can further degrade the performance of the ASR model. In this paper, we investigate alternatives to the standard text-only solutions by comparing audio-only and hybrid techniques of jointly utilising text and audio features. The models evaluated on spontaneous Finnish speech demonstrate that purely audio-based solutions are a viable option when ASR components are not available, while the hybrid multi-modal solutions achieve the best results.

artificial intelligence, speech recognition, transcript, (16 more...)

arXiv.org Artificial Intelligence

Jul-21-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Minnesota > Hennepin County > Minneapolis (0.14)
- Europe
  - Finland (0.05)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia > India
  - Karnataka > Bengaluru (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found