Explaining Vision and Language through Graphs of Events in Space and Time