Vision Language Models for Dynamic Human Activity Recognition in Healthcare Settings