Learning World Models for Interactive Video Generation