Look, Listen and Learn — A Multimodal LSTM for Speaker Identification