ViLPAct: A Benchmark for Compositional Generalization on Multimodal Human Activities

Open in new window