Transformer Learns Optimal Variable Selection in Group-Sparse Classification

Open in new window