13 — Understanding FastText: Efficient Word Representations for NLP

Aysel Aydin
4 min readAug 10, 2024

--

In this article, we will talk about FastText one of the word embedding techniques. Until now we have learnt different word embedding techniques like Word2Vec and GloVe. Before we start, I recommend you read the article I have previously explained on Word Embedding.

FastText is an open-source library developed by the Facebook AI Research lab that extends the Word2Vec model. Unlike Word2Vec, FastText not only considers whole words but also incorporates subword information like n-grams. In this way, it better represents rare words and spelling errors. Let’s take this into a little more detail.

In FastText, each word is represented as the average of the vector representation of its character n-grams along with the word itself.

Consider the word “horse” and n = 3, then the word will be represented by character n-grams:

  • “Horse” n-gram: "<ho", "hor", "ors", "rse", "se>"

The < > symbols used when preparing n-grams allow us to better understand the beginning and end of the word in n-gram analysis and are especially useful when we want to analyze certain letter sequences at the beginning or end. With such markups, FastText creates more accurate representations by taking into account the beginning and end of words.

So, the word embedding for the word “horse” can be given as the sum of all vector representations of all of its character n-gram and the word itself.

Why FastText Embeddings should be used?

Word2Vec and GloVe are word-level word embedding techniques. Therefore, the words that can be learned are limited by the number of words in the training set. If it is trained with 10000 different words, all words other than these words will enter the created model as unknown. In other words, the scope of the training set needs to be kept wide, which will naturally have a performance cost. Also, if a word is misspelled, it will not match its equivalent in your training set, so it will still be input to these models as unknown.

FastText is a model trained at the sub-word level. In other words, each word is learned at the sub-word level and thus can cover much more words than those at the word level. Also, since it is trained at the sub-word level, even if a word is misspelled, it can have a very close word representation thanks to its similarity to the correct word.

Let’s explain with an example.

  • The word “book” is a common word and is likely to appear in the training dataset.
  • The word “bookshelf” may be a less common word and not included in the dataset.

When FastText breaks the word “bookshelf” into n-grams, it finds some n-grams that overlap with the word “book”:

  • 3-gram: "<bo", "boo", "ook", "oks", "ksh", "she", "hel", "elf", "lf>"

These n-grams overlap with the n-grams found in the word “book” (“<bo”, “boo”, “ook”). Therefore, FastText can represent the word “bookshelf” with a vector close to the word “book”.

This enables it to handle out-of-vocabulary words effectively by breaking terms into subword units and generating embeddings for these units, even for unseen words. This capability makes fastText more robust in dealing with rare or morphologically complex expressions.

Advantages of FastText

  • FastText can represent even words it has never seen before using n-grams. This is especially valuable for models with limited vocabulary.
  • Since the roots of words in agglutinative languages ​​change frequently, FastText is better at capturing the roots of words in such languages.
  • FastText can be trained quickly on large datasets and performs better compared to similar models.

Disadvantages of FastText

  • FastText breaks words into n-grams, which can sometimes lead to misclassifications. For example, the word “house” might be confused with “horse” because their n-grams are similar. This can cause the model to mistakenly match “horse” with “house.”
  • FastText can use more memory because it is n-gram based. Especially when working with large datasets and large vocabularies, the memory usage can increase significantly.

Conclusion

In conclusion, FastText is a very flexible and powerful model in terms of word representation. It successfully represents rare and derived words, especially in languages ​​such as Turkish. The model’s n-gram logic better captures similarities between words and thus provides a more powerful word representation.

I hope it will be a useful article for you. If you stayed with me until the end, thank you for reading! Happy coding 🤞

Follow for more upcoming Articles about NLP, ML & DL❤️
Contact Accounts:
Twitter, LinkedIn

--

--