Multi-Head Attention Dashboard

Explore how different attention heads specialize in different types of relationships

🎯 Multiple Perspectives on Language

Instead of using one attention mechanism, transformers use multiple "heads" that each specialize in different types of relationships. Select an example to see how different heads focus on different aspects:

👥 8 Different Attention Heads

Each head learns to focus on different patterns. Click on any head to see it enlarged:

🔍 Head Focus Detail

Head 1: Subject-Verb Relationships

This head specializes in connecting subjects with their verbs

Darker colors = stronger attention. Click on cells to see attention strength.

❌ Single Attention Head
  • One pattern recognition system
  • Limited perspective on relationships
  • Must handle all linguistic patterns
  • Jack-of-all-trades, master of none
Problem: Too many different types of relationships for one attention head to handle well.
✅ Multi-Head Attention
  • 8 specialized pattern recognition systems
  • Each head focuses on specific relationships
  • Parallel processing of different aspects
  • Combined expertise from all heads
Advantage: Like having multiple experts each analyzing different aspects of language structure.

Key Insights