Word2Vec Architecture Comparison

Word2Vec Model Architectures

Objective: Predict context words from center word

Given "cat" in "The cat sat on the mat", predict surrounding words: "The", "sat", "on", etc.

Better for rare words and larger datasets

Objective: Predict center word from context words

Given "The", "sat", "on", etc. in "The ___ sat on the mat", predict the center word "cat"

Trains faster, better for frequent words

The cat sat on the mat

Context Window Size 2

Skip-gram: predicts context from center words (better for rare words)
CBOW: predicts center from context words (faster training)
Both produce high-quality word vectors with less computation than full language models
Negative sampling dramatically increases training efficiency
Vectors capture semantic relationships through shared statistical patterns

Feature	Skip-gram	CBOW
Prediction Task	Center → Context	Context → Center
Input	Single word	Multiple words
Output	Multiple predictions	Single prediction
Training Speed	Slower	Faster
Best For	Rare words, Large datasets	Frequent words, Smaller datasets
Accuracy on Analogy Tasks	Higher	Lower
Mathematical Formulation	Maximize P(context\|center)	Maximize P(center\|context)