Context-Aware Human Behavior Prediction Using Multimodal Large Language Models: Challenges and Insights