Disentanglement Beyond Static vs. Dynamic: ABenchmark and Evaluation Framework for Multi-Factor Sequential Representations

Jun-22-2026, 07:07:39 GMT–Neural Information Processing Systems

Learning disentangled representations in sequential data is a key goal in deep learning, with broad applications in vision, audio, and time series. While realworld data involves multiple interacting semantic factors over time, prior work has mostly focused on simpler two-factor static and dynamic settings, primarily because such settings make data collection easier, thereby overlooking the inherently multifactor nature of real-world data. We introduce the first standardized benchmark for evaluating multi-factor sequential disentanglement across six diverse datasets spanning video, audio, and time series. Our benchmark includes modular tools for dataset integration, model development, and evaluation metrics tailored to multi-factor analysis. We additionally propose a post-hoc Latent Exploration Stage to automatically align latent dimensions with semantic factors, and introduce a Koopman-inspired model that achieves state-of-the-art results. Moreover, we show that Vision-Language Models can automate dataset annotation and serve as zeroshot disentanglement evaluators, removing the need for manual labels and human intervention. Together, these contributions provide a robust and scalable foundation for advancing multi-factor sequential disentanglement. Our code is available on GitHub, and the datasets and trained models are available on Hugging Face.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Jun-22-2026, 07:07:39 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.45)

Genre:
- Research Report > Experimental Study (0.93)
- Overview (0.92)

Industry:
- Leisure & Entertainment (0.67)
- Media > Music (0.45)
- Information Technology (0.45)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language (1.00)
  - Representation & Reasoning (0.92)
  - Machine Learning > Neural Networks
    - Deep Learning (0.66)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found