Double Descent as a Lens for Sample Efficiency in Autoregressive vs. Discrete Diffusion Models