Grounding Foundational Vision Models with 3D Human Poses for Robust Action Recognition

Open in new window