Training Configuration
Simple Model
1 context word, 50 hidden units
Bengio Model
3 context words, 100 hidden units
Large Model
5 context words, 200 hidden units
N-gram Baseline
Trigram with smoothing
Ready to train
Epoch 0 / 100
0.00
Current Loss
0%
Accuracy
∞
Perplexity
0s
Training Time
Training Insights
- Select a model architecture to begin
- Neural models typically train for many epochs
- Watch loss decrease and accuracy increase over time
Loss & Accuracy Curves
Understanding Training Curves
- Loss: How wrong the model's predictions are (lower is better)
- Accuracy: Percentage of correct next-word predictions
- Perplexity: Exponential of loss (lower means less confused)
- Convergence: When curves flatten, training can stop