Sparsity Explorer

Visualize how n-gram coverage drops exponentially as sequence length increases

Simulation Controls

Vocabulary Size
1,000
N-gram Size
3
Training Data Size
100k

Possible vs. Seen N-grams

Exponential growth of possible combinations vs. linear growth of training data

Coverage Percentage

Percentage of possible n-grams actually seen in training

Sparsity Statistics

0
Possible N-grams
0
Seen in Training
0%
Coverage
0
Unseen N-grams

The Sparsity Problem

As n-gram size increases, the number of possible combinations grows exponentially, but training data grows linearly. This creates massive "holes" in our knowledge.

Key Insights

  • Adjust the sliders to see how sparsity changes