Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers

Neural Information Processing Systems 

A common lens to theoretically study neural net architectures is to analyze the functions they can approximate.