iVideoGPT: Interactive VideoGPTs are Scalable World Models
–Neural Information Processing Systems
World models empower model-based agents to interactively explore, reason, and plan within imagined environments for real-world decision-making. However, the high demand for interactivity poses challenges in harnessing recent advancements in video generative models for developing world models at scale. This work introduces Interactive VideoGPT (iVideoGPT), a scalable autoregressive transformer framework that integrates multimodal signals--visual observations, actions, and rewards--into a sequence of tokens, facilitating an interactive experience of agents via next-token prediction.
Neural Information Processing Systems
May-30-2025, 09:48:55 GMT
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Education (0.46)
- Information Technology (0.46)
- Technology: