Visually Grounded Models of Spoken Language: A Survey of Datasets, Architectures and Evaluation Techniques

Open in new window