MaskCaptioner: Learning to Jointly Segment and Caption Object Trajectories in Videos

Open in new window