Large Language Models are Frame-level Directors for Zero-shot Text-to-Video Generation

Open in new window