Description
Document similarities is one of the most crucial problems of NLP. Finding similarity across documents is used in several domains such as recommending similar books and articles, identifying…
Summary
- Document similarities is one of the most crucial problems of NLP.
- We can train our own embeddings if have enough data and computation available or we can use pre-trained embeddings.
- Using this embedding we can convert every word of our document corpus into a 300-dimensional vector.
- Documents similar to first document based on cosine similarity and euclidean distance (Image by author) BERT- Bidirectional Encoder Representation from Transformers (BERT) is a state of the art technique for natural language processing pre-training developed by Google.