Patching Action Knowledge in Video-Language Foundation Models