Learning multimodal representations for sample-efficient recognition of human actions

Open in new window