⚙️ Scaling Factors

1M 1T 1B

More parameters = more knowledge storage capacity

1M 100T 1T

More training data = better pattern learning

1E18 1E24 1E21

More compute = longer training and larger models

75%
Estimated Performance Score

Real Model Examples

GPT-1 117M params
GPT-2 1.5B params
GPT-3 175B params
GPT-4 ~1T params

📈 Power Law Scaling

Why Progress Seems to "Slow Down"

We're climbing a power law curve, not a linear one. Each doubling of resources gives smaller (but consistent) improvements.

Early Stage (Low-hanging fruit):
First 1000x improvement gives huge gains - basic language understanding
Current Stage (Long tail):
Next 1000x improvement gives smaller gains - subtle reasoning, edge cases