Titanic Calling: Low Bandwidth Video Conference from the Titanic Wreck

Eyiokur, Fevziye Irem, Huber, Christian, Nguyen, Thai-Binh, Nguyen, Tuan-Nam, Retkowski, Fabian, Ugan, Enes Yavuz, Yaman, Dogucan, Waibel, Alexander

arXiv.org Artificial Intelligence 

For several years, video conferencing tools have In this paper, we investigate the aforementioned found applications across different domains and scenario by developing a comprehensive system have been utilized for a variety of purposes. The comprising speaker filtering and segmentation, pandemic in 2020 resulted in a substantial increase ASR, text segmentation, multi-speaker TTS, and in their usage, particularly in the realms of business audio-driven talking face generation modules. The and education, as the employees have been working use-case scenario of this system is as follows: assuming from home and students have been participating in the existence of multiple speakers and their the lectures online. Yet the application scope of pre-recorded videos, the system, upon the initiation the video communication systems could be beyond of speakers' speech, distinguishes between these scenarios. Such systems prove invaluable in speakers and their respective utterances. Following facilitating natural communication under challenging this phase, the ASR transcribes the text, and each conditions where conventional communication segmented text derived from a text segmentation is restricted, such as deep-sea expeditions or lacking component, undergoes processing by the TTS module a stable broadband internet connection. By to generate synthesized speech. As transmitting enabling the generation of audio and video, users text proves to be the most straightforward and costeffective can engage in seamless communication.