Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions