Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently

Open in new window