Are Transformers universal approximators of sequence-to-sequence functions?

Open in new window