OxfordVGG Submission to the EGO4D AV Transcription Challenge

Huh, Jaesung, Bain, Max, Zisserman, Andrew

Jul-18-2023–arXiv.org Artificial Intelligence

This report presents the technical details of our submission on the EGO4D Audio-Visual (AV) Automatic Speech Recognition Challenge 2023 from the OxfordVGG team. We present WhisperX, a system for efficient speech transcription of long-form audio with word-level time alignment, along with two text normalisers which are publicly available. Our final submission obtained 56.0% of the Word Error Rate (WER) on the challenge test set, ranked 1st on the leaderboard. All baseline codes and models are available on https://github.com/m-bain/whisperX.

artificial intelligence, machine learning, normaliser, (14 more...)

arXiv.org Artificial Intelligence

Jul-18-2023

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre:
- Research Report (0.65)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Speech > Speech Recognition (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found