TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation

Jun-10-2026, 21:56:32 GMT–Neural Information Processing Systems

In this work, we present TalkCuts, a large-scale dataset designed to facilitate the study of multi-shot human speech video generation. Unlike existing datasets that focus on single-shot, static viewpoints, TalkCuts offers 164k clips totaling over 500 hours of high-quality 1080P human speech videos with diverse camera shots, including close-up, half-body, and full-body views. The dataset includes detailed textual descriptions, 2D keypoints and 3D SMPL-X motion annotations, covering over 10k identities, enabling multimodal learning and evaluation.

artificial intelligence, name change, proceedings, (6 more...)

Neural Information Processing Systems

Jun-10-2026, 21:56:32 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.36)