Tokenisation is NP-Complete
Whittington, Philip, Bachmann, Gregor, Pimentel, Tiago
–arXiv.org Artificial Intelligence
In this work, we prove the NP-completeness of two variants of tokenisation, defined as the problem of compressing a dataset to at most $\delta$ symbols by either finding a vocabulary directly (direct tokenisation), or selecting a sequence of merge operations (bottom-up tokenisation).
arXiv.org Artificial Intelligence
Dec-19-2024
- Country:
- Asia (0.93)
- Europe (0.93)
- North America
- Mexico (0.28)
- United States (0.28)
- Genre:
- Research Report (0.40)
- Technology: