AITopics | bmu-moco

BMU-MoCo: BidirectionalMomentumUpdate forContinualVideo-LanguageModeling

Neural Information Processing SystemsFeb-10-2026, 17:59:17 GMT

Different from the original MoCo [19] and its cross-modal versions [15, 33, 35] that utilize momentum update for only momentum encoders to maintain a large consistent queue, our BMU strategy imposes momentum update on both momentum encoders and (video/text) encoders.

artificial intelligence, encoder, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

BMU-MoCo: Bidirectional Momentum Update for Continual Video-Language Modeling

Neural Information Processing SystemsDec-24-2025, 19:06:36 GMT

Video-language models suffer from forgetting old/learned knowledge when trained with streaming data. In this work, we thus propose a continual video-language modeling (CVLM) setting, where models are supposed to be sequentially trained on five widely-used video-text datasets with different data distributions. Although most of existing continual learning methods have achieved great success by exploiting extra information (e.g., memory data of past tasks) or dynamically extended networks, they cause enormous resource consumption when transferred to our CVLM setting. To overcome the challenges (i.e., catastrophic forgetting and heavy resource consumption) in CVLM, we propose a novel cross-modal MoCo-based model with bidirectional momentum update (BMU), termed BMU-MoCo. Concretely, our BMU-MoCo has two core designs: (1) Different from the conventional MoCo, we apply the momentum update to not only momentum encoders but also encoders (i.e., bidirectional) at each training step, which enables the model to review the learned knowledge retained in the momentum encoders.

bidirectional momentum update, bmu-moco, continual video-language modeling, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.62)
Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

BMU-MoCo: Bidirectional Momentum Update for Continual Video-Language Modeling - Supplementary Material - Yizhao Gao

Neural Information Processing SystemsAug-16-2025, 23:20:57 GMT

We provide the pseudocode of our BMU-MoCo in Algorithm 1. Algorithm 1 Pseudocode of BMU-MoCo. The R@5 results and its corresponding FR/HM are reported. The memory data are simply used as training samples in the training process. The model architecture is exactly the same as Base-MoCo. Collecting highly parallel data for paraphrase evaluation.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.40)

Add feedback

BMU-MoCo: Bidirectional Momentum Update for Continual Video-Language Modeling Yizhao Gao

Neural Information Processing SystemsAug-16-2025, 23:20:54 GMT

Video-language models suffer from forgetting old/learned knowledge when trained with streaming data.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.42)

Add feedback

BMU-MoCo: Bidirectional Momentum Update for Continual Video-Language Modeling

Neural Information Processing SystemsJan-17-2025, 18:41:28 GMT

Video-language models suffer from forgetting old/learned knowledge when trained with streaming data. In this work, we thus propose a continual video-language modeling (CVLM) setting, where models are supposed to be sequentially trained on five widely-used video-text datasets with different data distributions. Although most of existing continual learning methods have achieved great success by exploiting extra information (e.g., memory data of past tasks) or dynamically extended networks, they cause enormous resource consumption when transferred to our CVLM setting. To overcome the challenges (i.e., catastrophic forgetting and heavy resource consumption) in CVLM, we propose a novel cross-modal MoCo-based model with bidirectional momentum update (BMU), termed BMU-MoCo. Concretely, our BMU-MoCo has two core designs: (1) Different from the conventional MoCo, we apply the momentum update to not only momentum encoders but also encoders (i.e., bidirectional) at each training step, which enables the model to review the learned knowledge retained in the momentum encoders.

bidirectional momentum update, bmu-moco, continual video-language modeling, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.64)

Add feedback