Learning to play: A Multimodal Agent for 3D Game-Play