Utilizing Vision-Language Models as Action Models for Intent Recognition and Assistance

Open in new window