AITopics | v-jepa 2

Collaborating Authors

v-jepa 2

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

This AI Model Can Intuit How the Physical World Works

WIREDDec-7-2025

As the engineers who build self-driving cars know, it can be hard to get an AI system to reliably make sense of what it sees. Most systems designed to "understand" videos in order to either classify their content ("a person playing tennis," for example) or identify the contours of an object--say, a car up ahead--work in what's called "pixel space." The model essentially treats every pixel in a video as equal in importance. But these pixel-space models come with limitations. Imagine trying to make sense of a suburban street. If the scene has cars, traffic lights and trees, the model might focus too much on irrelevant details such as the motion of the leaves. It might miss the color of the traffic light, or the positions of nearby cars. "When you go to images or video, you don't want to work in [pixel] space because there are too many details you don't want to model," said Randall Balestriero, a computer scientist at Brown University. Yann LeCun, a computer scientist at New York University and the director of AI research at Meta, created JEPA, a predecessor to V-JEPA that works on still images, in 2022.

artificial intelligence, machine learning, video, (17 more...)

WIRED

Country:

North America > United States > New York (0.24)
North America > United States > California (0.04)
Europe > Slovakia (0.04)
(2 more...)

Industry: Transportation > Ground > Road (0.89)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Add feedback

Rethinking JEPA: Compute-Efficient Video SSL with Frozen Teachers

Li, Xianhang, Huang, Chen, Li, Chun-Liang, Malach, Eran, Susskind, Josh, Thilak, Vimal, Littwin, Etai

arXiv.org Artificial IntelligenceSep-30-2025

Video Joint Embedding Predictive Architectures (V-JEPA) learn generalizable off-the-shelf video representation by predicting masked regions in latent space with an exponential moving average (EMA)-updated teacher. While EMA prevents representation collapse, it complicates scalable model selection and couples teacher and student architectures. We revisit masked-latent prediction and show that a frozen teacher suffices. Concretely, we (i) train a target encoder with a simple pixel-reconstruction objective under V-JEPA masking, then (ii) freeze it and train a student to predict the teacher's latents on masked regions. This leads to a two-stage, unregularized scheme that we refer to as SALT (Static-teacher Asymmetric Latent Training). SALT decouples optimization into pixel reconstruction (teacher) and masked latent prediction (student), increasing transparency, efficiency, and scalability while preserving the ability of representation to generalize under frozen evaluation. Empirically, our student models outperform recently proposed V-JEPA 2 encoders under frozen backbone evaluation across diverse benchmarks. They are also more compute-optimal: at matched pretraining FLOPs, our method achieves higher probing accuracy, and its scaling curves dominate V-JEPA's accuracy-FLOPs Pareto frontier. Finally, we find that student quality is remarkably robust to teacher quality: high-performing students emerge even with small, sub-optimal teachers. This points to a compute budget allocation that should overwhelmingly favor the student. These results position SALT as a simple, scalable, and compute-efficient alternative to EMA-based self-distillation for video representation learning.

artificial intelligence, machine learning, v-jepa 2, (17 more...)

arXiv.org Artificial Intelligence

2509.24317

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Technology (0.49)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback