Linguistic Laws Meet Protein Sequences: A Comparative Analysis of Subword Tokenization Methods

Open in new window