Tutorial: $φ$-Transductions in OpenFst via the Gallic Semiring
Cognetta, Marco, Allauzen, Cyril
–arXiv.org Artificial Intelligence
OpenFst, a popular finite-state transducer library, supports $φ$-transitions but, due to an implementation constraint, they cannot be used with transducers in a straightforward way. In this short tutorial, we describe how one can use other functionality provided by OpenFst (namely, the Gallic semiring) to correctly implement $φ$-transductions and demonstrate it by implementing the MaxMatch (WordPiece) tokenization algorithm (Devlin et al., 2019; Song et al., 2021). Accompanying self-contained code examples are provided. https://www.openfst.org/twiki/pub/Contrib/FstContrib/phi_transduction_tutorial_code.tgz
arXiv.org Artificial Intelligence
Jun-24-2025
- Country:
- Europe > Czechia
- Prague (0.04)
- North America
- Europe > Czechia
- Genre:
- Instructional Material (0.35)
- Research Report (0.40)
- Technology: