On the Computational Power of Decoder-Only Transformer Language Models

Open in new window