Towards Video Text Visual Question Answering: Benchmark and Baseline

Open in new window