TAP-Vid: A Benchmark for Tracking Any Point in a Video Carl Doersch Ankush Gupta
–Neural Information Processing Systems
Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move. This information is useful to make inferences about 3D shape, physical properties and object interactions. While the problem of tracking arbitrary physical points on surfaces over longer video clips has received some attention, no dataset or benchmark for evaluation existed, until now.
Neural Information Processing Systems
Jun-2-2025, 12:52:05 GMT