How Do Transformers Learn Variable Binding in Symbolic Programs?

Open in new window