Diverse capability and scaling of diffusion and auto-regressive models when learning abstract rules