Verbs in Action: Improving verb understanding in video-language models