A model that can recognize speech in different languages from a speaker's lip movements