Subword

Tokeniser Vocabulary

Fixed set of subword units learned or predefined for tokenisation; typically 32k–128k tokens, balancing compression and flexibility.
fastText

Word embedding method using character n-grams to handle out-of-vocabulary words and morphological variants; published by Bojanowski et al. in 2017.