What’s in a word?

By Medium - 2020-12-28

Description

Why tf-idf sometimes fails to accurately capture word importance, and what we can use instead

Summary

  • Getting Started What’s in a word?
  • As I mentioned above, this is a simple yet powerful tool, and gives generally good estimates of which words define documents in a corpus.
  • To see this in action, consider the same setup from above, this time with apple making up 10% of words in document A.
  • If apple did not appear at all in document B, the tf-idf value would be relatively high.
  • However, as we can see in the table, the tf-idf value is actually zero, suggesting the word is not at all unique or important to the album.

 

Topics

  1. NLP (0.16)
  2. Security (0.07)
  3. Backend (0.04)

Similar Articles

QuickGraph#17 The English WordNet in Neo4j (part 2)

By Jesús Barrasa - 2021-02-05

In this second post on WordNet on Neo4j I will be focusing on querying and analysing the graph that we created in the previous post. I'll leave for a third one more advanced analysis and integrations ...

Finding the Narrative with Natural Language Processing

By Medium - 2021-01-01

When I first started studying data science, one of the areas I was most excited to learn was natural language processing. “Unsupervised machine learning” certainly has a mystical ring to it, and…