Data Mixture Inference Attack: BPE Tokenizers Reveal Training Data Compositions

Open in new window