Towards Zero-Shot & Explainable Video Description by Reasoning over Graphs of Events in Space and Time