snori
snori
Log in
DE EN

The AI glossary AI basics

Tokenization

Tokenisierung

Tokenization is the process of splitting text into tokens before a language model can process it. The tokenization method used affects how efficiently a model handles a given language — languages with many compound words, such as German, often need more tokens per sentence than English.

Source: Hugging Face — Tokenizers documentation

← Back to the glossary