23Continual LearningSeparationBinding

Neural Information Processing Systems 

However, real-world videos typically exist as continu-ously evolving data streams (e.g., dynamic scenes captured by wearable glasses),necessitating models to continually adapt to shifting data distributions and novelscenarios. Considering the prohibitive computational costs of fine-tuning modelson new tasks, usually, a small subset of parameters is updated while the bulkof the model remains frozen. This poses new challenges to existing continuallearning frameworks in the context of large multimodal foundation models, i.e.,catastrophic forgetting and update conflict. While the foundation models strug-gle with parameter-efficient continual learning, the hippocampus in the humanbrain has evolved highly efficient mechanisms for memory formation and con-solidation. Inspired by the rapid Binding and pattern separation mechanisms inthe hippocampus, in this work, we propose Bisecle for video-language continuallearning, where a multi-directional supervision module is used to capture morecross-modal relationships and a contrastive prompt learning scheme is designedto isolate task-specific knowledge to facilitate efficient memory storage. Bindingand separation processes further strengthen the ability of VLMs to retain complexexperiences, enabling robust and efficient continual learning in video understandingtasks. We perform a thorough evaluation of the proposed Bisecle, demonstratingits ability to mitigate forgetting and enhance cross-task generalization on severalVideoQA benchmarks.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found