video-SALMONN 2: Caption-Enhanced Audio-Visual Large Language Models

Open in new window