Scaling Factors
More parameters = more knowledge storage capacity
More training data = better pattern learning
More compute = longer training and larger models
75%
Estimated Performance Score
Real Model Examples
GPT-1
117M params
GPT-2
1.5B params
GPT-3
175B params
GPT-4
~1T params
Power Law Scaling
Why Progress Seems to "Slow Down"
We're climbing a power law curve, not a linear one. Each doubling of resources gives smaller (but consistent) improvements.
Early Stage (Low-hanging fruit):
First 1000x improvement gives huge gains - basic language understanding
First 1000x improvement gives huge gains - basic language understanding
Current Stage (Long tail):
Next 1000x improvement gives smaller gains - subtle reasoning, edge cases
Next 1000x improvement gives smaller gains - subtle reasoning, edge cases