Polysemy Problem with Static Embeddings

Explore how traditional word embeddings struggle with words that have multiple meanings

Visualization of the Polysemy Problem

The Polysemy Problem

Traditional word embeddings (Word2Vec, GloVe) assign a single vector to each word in the vocabulary. This causes problems for words with multiple meanings, as the embedding becomes an "average" of all senses.

For example, the word "bank" can refer to a financial institution or the side of a river. These meanings are quite different, but in static embeddings, they get merged into one representation.

More advanced models in Module 2 (like BERT) use contextual embeddings where each word gets a different vector depending on its surrounding context, addressing this limitation.

Words with Multiple Meanings

Financial Sense

I deposited money in the bank yesterday.

River Sense

We had a picnic on the bank of the river.

Key Insights

  • Static embeddings create a single vector for each word, regardless of context
  • Words with multiple meanings (polysemous) are poorly represented by static embeddings
  • The embedding becomes an "average" of all meanings, which can be problematic
  • This limitation is addressed by contextual embeddings in more advanced models
  • Contextual models (covered in Module 2) generate different vectors based on context

Static vs. Contextual Embeddings Comparison

Feature Static Embeddings Contextual Embeddings
Word Representation One vector per word Different vectors based on context
Handles Multiple Meanings ❌ Poor (blends meanings) ✅ Well (distinguishes meanings)
Computational Cost Low (efficient) High (more intensive)
Storage Requirements Low (one vector per word) High (many vectors per word)
Examples Word2Vec, GloVe, FastText BERT, GPT, ELMo, T5
Out-of-Vocabulary Words ❌ Cannot handle ✅ Can handle via subword tokenization
Sentence-Level Understanding Limited Strong

Bridge to Module 2: Transformer Architectures

  • In Module 2, we'll explore transformer architectures that use contextual embeddings
  • Transformers use attention mechanisms to generate context-dependent representations
  • This addresses the polysemy problem by creating different vectors for different contexts
  • The same word can have very different representations based on its usage
  • This enables more sophisticated language understanding, especially for ambiguous text