Bengio Neural Language Model

Explore the complete neural network architecture from input words to probability distributions

Interactive Neural Language Model

Interactive Controls

Animation Flow
Change Input Words
Position 1 (w₁):
The
A
My
Position 2 (w₂):
cat
dog
bird
Position 3 (w₃):
sat
ran
jumped
Softmax Details
How Softmax Works:
P(wᵢ) = exp(zᵢ) / Σ exp(zⱼ)
1. Raw scores (logits): [2.1, 1.8, 0.9, 0.3, ...]
2. Exponentials: [8.17, 6.05, 2.46, 1.35, ...]
3. Sum: 18.03
4. Probabilities: [0.45, 0.34, 0.14, 0.07, ...]

Key Architecture Insights

  • Each word is converted to a dense vector (embedding)
  • Embeddings are concatenated to form the input to the neural network
  • Hidden layer learns complex patterns from the concatenated embeddings
  • Output layer generates scores for each word in the vocabulary
  • Softmax converts raw scores to probabilities that sum to 1