Explore how different attention heads specialize in different types of relationships
Instead of using one attention mechanism, transformers use multiple "heads" that each specialize in different types of relationships. Select an example to see how different heads focus on different aspects:
Each head learns to focus on different patterns. Click on any head to see it enlarged:
This head specializes in connecting subjects with their verbs
Darker colors = stronger attention. Click on cells to see attention strength.