Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

Open in new window