CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code

Open in new window