Tokenization
Tokenisierung
Tokenization is the process of splitting text into tokens before a language model can process it. The tokenization method used affects how efficiently a model handles a given language — languages with many compound words, such as German, often need more tokens per sentence than English.