STELLA: Continual Audio-Video Pre-training with Spatio-Temporal Localized Alignment

Open in new window