Explore the differences between Skip-gram and CBOW architectures
Objective: Predict context words from center word
Given "cat" in "The cat sat on the mat", predict surrounding words: "The", "sat", "on", etc.
Better for rare words and larger datasets
Objective: Predict center word from context words
Given "The", "sat", "on", etc. in "The ___ sat on the mat", predict the center word "cat"
Trains faster, better for frequent words
Feature | Skip-gram | CBOW |
---|---|---|
Prediction Task | Center → Context | Context → Center |
Input | Single word | Multiple words |
Output | Multiple predictions | Single prediction |
Training Speed | Slower | Faster |
Best For | Rare words, Large datasets | Frequent words, Smaller datasets |
Accuracy on Analogy Tasks | Higher | Lower |
Mathematical Formulation | Maximize P(context|center) | Maximize P(center|context) |