Calculating Document Similarities using BERT, word2vec, and other models

By Medium - 2020-12-03

Description

Document similarities is one of the most crucial problems of NLP. Finding similarity across documents is used in several domains such as recommending similar books and articles, identifying…

Summary

  • Document similarities is one of the most crucial problems of NLP.
  • We can train our own embeddings if have enough data and computation available or we can use pre-trained embeddings.
  • Using this embedding we can convert every word of our document corpus into a 300-dimensional vector.
  • Documents similar to first document based on cosine similarity and euclidean distance (Image by author) BERT- Bidirectional Encoder Representation from Transformers (BERT) is a state of the art technique for natural language processing pre-training developed by Google.

 

Topics

  1. NLP (0.27)
  2. Machine_Learning (0.06)
  3. Backend (0.04)

Similar Articles

What’s in a word?

By Medium - 2020-12-28

Why tf-idf sometimes fails to accurately capture word importance, and what we can use instead

A Beginner’s Guide to the CLIP Model

By KDnuggets - 2021-03-11

CLIP is a bridge between computer vision and natural language processing. I'm here to break CLIP down for you in an accessible and fun read! In this post, I'll cover what CLIP is, how CLIP works, and ...