From Vision To Language through Graph of Events in Space and Time: An Explainable Self-supervised Approach

Open in new window