Word2Vec Architecture Comparison

Explore the differences between Skip-gram and CBOW architectures

Word2Vec Model Architectures

SSkip-gram Architecture

Objective: Predict context words from center word

Given "cat" in "The cat sat on the mat", predict surrounding words: "The", "sat", "on", etc.

Better for rare words and larger datasets

CCBOW Architecture

Objective: Predict center word from context words

Given "The", "sat", "on", etc. in "The ___ sat on the mat", predict the center word "cat"

Trains faster, better for frequent words

Interactive Controls

The cat sat on the mat
Context Window Size 2

Key Insights

  • Skip-gram: predicts context from center words (better for rare words)
  • CBOW: predicts center from context words (faster training)
  • Both produce high-quality word vectors with less computation than full language models
  • Negative sampling dramatically increases training efficiency
  • Vectors capture semantic relationships through shared statistical patterns

Architecture Comparison

Feature Skip-gram CBOW
Prediction Task Center → Context Context → Center
Input Single word Multiple words
Output Multiple predictions Single prediction
Training Speed Slower Faster
Best For Rare words, Large datasets Frequent words, Smaller datasets
Accuracy on Analogy Tasks Higher Lower
Mathematical Formulation Maximize P(context|center) Maximize P(center|context)

The Word2Vec Innovation

  • Created by Tomas Mikolov at Google in 2013
  • Focuses only on learning good word vectors, not prediction
  • Processes billions of words much faster than neural LMs
  • Negative sampling avoids expensive softmax computation
  • Produces embeddings that capture semantic relationships
  • Example of a specialized unsupervised embedding technique