ETTA: Elucidating the Design Space of Text-to-Audio Models