Translating speech with just images