Top-down Activity Representation Learning for Video Question Answering

Open in new window