Tokenisation is NP-Complete
Whittington, Philip, Bachmann, Gregor, Pimentel, Tiago
–arXiv.org Artificial Intelligence
In this work, we prove the NP-completeness of two variants of tokenisation, defined as the problem of compressing a dataset to at most $\delta$ symbols by either finding a vocabulary directly (direct tokenisation), or selecting a sequence of merge operations (bottom-up tokenisation).
arXiv.org Artificial Intelligence
Dec-19-2024
- Country:
- Oceania > Australia
- North America
- United States
- New York > New York County
- New York City (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- New York > New York County
- Mexico > Mexico City
- Mexico City (0.04)
- Canada > Ontario
- Toronto (0.04)
- United States
- Europe
- Monaco (0.04)
- Germany > Berlin (0.04)
- Switzerland > Zürich
- Zürich (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Asia
- Singapore (0.04)
- Middle East > Jordan (0.04)
- China (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Afghanistan > Parwan Province
- Charikar (0.04)
- Genre:
- Research Report (0.40)
- Technology: