11 — Word2Vec Approaches: Continuous Bag of Words (CBOW) & Skip-Gram

5 min readJul 19, 2024

In this article, we will talk about Continuous Bag of Words (CBOW) and Skip-Gram, which are Word2vec approaches. Before we start, I recommend you read the article I have previously explained on Word2Vec.

10 — Understanding Word2Vec 1: Word Embedding in NLP

After reading Word2vec article, now that you have a basic understanding of this topic, let’s start Word2vec approaches.

Continuous Bag of Words (CBOW)

The main idea behind the CBOW is to predict a target word given the context of the surrounding words. It does this by using a neural network with a single hidden layer to learn the weights that map the context words to the target word.

The architecture of the CBOW model is simple, consisting of an input layer, a hidden layer, and an output layer. The input layer is used to represent the context words, the hidden layer is used to learn the word embeddings, and the output layer is used to predict the target word.

Skip-Gram

The Skip-Gram model is the complete opposite of the CBOW model. Here, instead of using the surrounding words to predict the middle word, we pass as input a target word to predict the neighboring words.

The Skip-gram model has a simple design with an input layer, a hidden layer, and an output layer, just like the CBOW model. The input layer is used to represent the target word, the hidden layer is used to learn the word embeddings, and the output layer is used to predict the context words.

Differences between CBOW and Skip-Gram

Now let’s give a simple example that explains Word2Vec using Python with CBOW and Skip-gram models. For this example we will use the Gensim library. This library is very useful for creating and training Word2Vec models.

If the Gensim library is not installed, you must first install it with the “pip install gensim” command.

import gensim
from gensim.models import Word2Vec

sentences = [
    'This is an example sentence for Word2vec.',
    'We are creating a Word2vec model using the Gensim library.',
    'We are working with CBOW and Skipgram models.',
    'Python is a programming language for natural language processing.',
    'Word2vec is one of the word embedding techniques.',
    'The Word2vec model is used for word embeddings.',
    'Gensim provides an easy way to train Word2vec models.',
    'CBOW and Skipgram are two types of Word2vec models.',
    'Word2vec is a technique for natural language processing.',
    'This sentence is about Word2vec and its applications.',
    'Word2vec is a popular word embedding method.',
    'Many researchers use Word2vec for various NLP tasks.',
    'The Skipgram model focuses on predicting context words.',
    'CBOW model predicts the center word from context words.',
    'Natural language processing involves working with large datasets.'
]

Here, we will first do the preprocessing before creating the vectors of our sentences. I recommend you read the article I have previously explained Text Preprocessing Techniques for NLP

After preprocessing, we obtain a list as follows.

sentences = [
    ['this', 'is', 'an', 'example', 'sentence', 'for', 'word2vec'],
    ['we', 'are', 'creating', 'a', 'word2vec', 'model', 'using', 'the', 'gensim', 'library'],
    ['we', 'are', 'working', 'with', 'cbow', 'and', 'skipgram', 'models'],
    ['python', 'is', 'a', 'programming', 'language', 'for', 'natural', 'language', 'processing'],
    ['word2vec', 'is', 'one', 'of', 'the', 'word', 'embedding', 'techniques'],
    ['the', 'word2vec', 'model', 'is', 'used', 'for', 'word', 'embeddings'],
    ['gensim', 'provides', 'an', 'easy', 'way', 'to', 'train', 'word2vec', 'models'],
    ['cbow', 'and', 'skipgram', 'are', 'two', 'types', 'of', 'word2vec', 'models'],
    ['word2vec', 'is', 'a', 'technique', 'for', 'natural', 'language', 'processing'],
    ['this', 'sentence', 'is', 'about', 'word2vec', 'and', 'its', 'applications'],
    ['word2vec', 'is', 'a', 'popular', 'word', 'embedding', 'method'],
    ['many', 'researchers', 'use', 'word2vec', 'for', 'various', 'nlp', 'tasks'],
    ['the', 'skipgram', 'model', 'focuses', 'on', 'predicting', 'context', 'words'],
    ['cbow', 'model', 'predicts', 'the', 'center', 'word', 'from', 'context', 'words'],
    ['natural', 'language', 'processing', 'involves', 'working', 'with', 'large', 'datasets']
]

Let’s create CBOW and Skip-gram models

cbow_model = Word2Vec(sentences, vector_size=100, window=5, min_count=1, sg=0, alpha=0.03, min_alpha=0.0007, epochs=100)
skipgram_model = Word2Vec(sentences, vector_size=100, window=5, min_count=1, sg=1, alpha=0.03, min_alpha=0.0007, epochs=100)

cbow_model.train(sentences, total_examples=len(sentences), epochs=100)
skipgram_model.train(sentences, total_examples=len(sentences), epochs=100)

vector_size: It is the size of the vector to be created for each element. Depending on the size of the text and the type of project, more or less may be selected, but usually between 100–300 is selected.
window: Determines the size of the word window around the target word. Window size selection is about how well you can capture the meaning and context of the word. Therefore, it is important to choose a window size considering the dataset to be used and the needs of the project. For small data sets, between 2–5 is usually chosen. In large data sets, the window size can be selected as 5–10 or more.
min_count: It is the minimum number of occurrences of the target word in the corpus. Words that do not meet this condition are not included in Word2Vec’s dictionary. Keeping this limit high, especially for very large copruses, will further increase success. However, for small data sets it would be better to keep it small.
sg: If 1 is selected, skip-gram is used, otherwise the CBOW algorithm is used.
epoch: Epoch refers to the number of times the model reviews the entire training dataset once. Small data sets often require more epochs because the model may need to see the data many times to learn enough information. In large data sets, fewer epochs may be sufficient.
alpha & min_alpha: alpha (learning rate) was assigned an initial value of 0.03 and min_alpha (minimum learning rate) was assigned 0.0007. This slowly reduces the learning rate of the model throughout the learning process and helps make the training process more stable.

After explaining the parameters we use, let’s now create the vectors.

word_vectors_cbow = cbow_model.wv
similarity_cbow = word_vectors_cbow.similarity('word2vec', 'gensim')
print(f"Similarity between 'word2vec' and 'gensim': {similarity_cbow} with CBOW")


word_vectors_skipgram= skipgram_model.wv
similarity_skip = word_vectors_skipgram.similarity('word2vec', 'gensim')
print(f"Similarity between 'word2vec' and 'gensim': {similarity_skip} with Skip-Gram")

By running this code, you can observe the differences and similarities between CBOW and Skip-gram models. You can try to get different results by changing the number of epochs, window size or other parameters.

Conclusion

In conclusion, we learned about Continuous Bag of Words (CBOW) and Skip-Gram, which are Word2vec approaches. The main idea behind the CBOW is to predict a target word given the context of the surrounding words. The Skip-Gram model is the complete opposite of the CBOW model. Here, instead of using the surrounding words to predict the middle word, we pass as input a target word to predict the neighboring words.

As I said in my previous article, Word2Vec can still be used sometimes, but today, there are better word embedding methods for understanding context and doing complex NLP tasks. I will explain these better word embedding techniques in future articles.

Follow for more upcoming Articles about NLP, ML & DL❤️
Contact Accounts: Twitter, LinkedIn

11 — Word2Vec Approaches: Continuous Bag of Words (CBOW) & Skip-Gram

Continuous Bag of Words (CBOW)

Skip-Gram

Differences between CBOW and Skip-Gram

Conclusion

Written by Aysel Aydin