Tokeniser-Vocabulary
-
Tokeniser Vocabulary
Fixed set of subword units learned or predefined for tokenisation; typically 32k–128k tokens, balancing compression and flexibility.
Fixed set of subword units learned or predefined for tokenisation; typically 32k–128k tokens, balancing compression and flexibility.