Language Model Based Text-to-Audio Generation: Anti-Causally Aligned Collaborative Residual Transformers

Open in new window