Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition

Open in new window