Towards Video Text Visual Question Answering: Benchmark and Baseline
–Neural Information Processing Systems
There are already some text-based visual question answering (TextVQA) benchmarks for developing machine's ability to answer questions based on texts in images in recent years. However, models developed on these benchmarks cannot work effectively in many real-life scenarios (e.g.
Neural Information Processing Systems
Dec-25-2025, 14:08:07 GMT
- Technology: