Everything is a Video: Unifying Modalities through Next-Frame Prediction

Open in new window