Discover why attention needs position information to understand word order
Attention mechanism by itself is position-blind - it treats "John loves Mary" exactly the same as "Mary loves John". This is a fundamental problem that position embeddings solve.
Attention sees: {John, loves, Mary}
Same as: {Mary, loves, John}
Attention sees: {John@pos1, loves@pos2, Mary@pos3}
Different from: {Mary@pos1, loves@pos2, John@pos3}
Without position embeddings, these sentences would be identical to the attention mechanism:
Try shuffling the words - without position embeddings, attention can't tell the difference between any arrangement!
Each position gets a unique "fingerprint" that helps attention understand word order:
Each position has a unique pattern of values. Hover over positions to see their encoding!
"cat" → [0.2, -0.1, 0.8, ...]
pos2 → [0.0, 0.5, -0.3, ...]
[0.2, 0.4, 0.5, ...]