TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment

Open in new window