Self-Chained Image-Language Model for Video Localization and Question Answering

Open in new window