Paxion: Patching Action Knowledge in Video-Language Foundation Models

Neural Information Processing Systems 

Action knowledge involves the understanding of textual, visual, and temporal aspects of actions.