Efficient In-Context Learning in Vision-Language Models for Egocentric Videos

Open in new window