seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models

Jun-16-2026, 00:57:23 GMT–Neural Information Processing Systems

Joint-embedding self-supervised learning (SSL) commonly relies on transformations such as data augmentation and masking to learn visual representations, a task achieved by enforcing invariance or equivariance with respect to these transformations applied to two views of an image. This dominant two-view paradigm in SSL often limits the flexibility of learned representations for downstream adaptation by creating performance trade-offs between high-level invariance-demanding tasks such as image classification and more fine-grained equivariance-related tasks. In this work, we propose seq-JEPA, a world modeling framework that introduces architectural inductive biases into joint-embedding predictive architectures to resolve this trade-off. Without relying on dual equivariance predictors or loss terms, seq-JEPA simultaneously learns two architecturally separate representations for equivariance-and invariance-demanding tasks. To do so, our model processes short sequences of different views (observations) of inputs.

artificial intelligence, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Jun-16-2026, 00:57:23 GMT

Conferences PDF

Add feedback

Country:
- North America > Canada (0.46)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (0.87)
  - Artificial Intelligence
    - Vision (1.00)
    - Representation & Reasoning (1.00)
    - Natural Language (1.00)
    - Machine Learning > Neural Networks (1.00)
    - Cognitive Science (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found