Transformers on Markov Data: Constant Depth Suffices

Open in new window