Input & Controls
Smaller vocabulary = more splitting, Larger vocabulary = less splitting
Character-Level
Split every character individually
Word-Level
Keep words intact (fails on rare words)
Subword (BPE)
Optimal balance: common words intact, rare words split
0
Tokens
0
Vocab Used
0%
Efficiency
Tokenization Results
Tokenized Output:
Common words
Medium frequency
Rare/split tokens
Power Law Distribution
Key Insights
- Enter text to see tokenization analysis