Autoregressive Diffusion Transformer for Text-to-Speech Synthesis