What’s in a word?

By Medium - 2020-12-28


Why tf-idf sometimes fails to accurately capture word importance, and what we can use instead


  • Getting Started What’s in a word?
  • As I mentioned above, this is a simple yet powerful tool, and gives generally good estimates of which words define documents in a corpus.
  • To see this in action, consider the same setup from above, this time with apple making up 10% of words in document A.
  • If apple did not appear at all in document B, the tf-idf value would be relatively high.
  • However, as we can see in the table, the tf-idf value is actually zero, suggesting the word is not at all unique or important to the album.



  1. NLP (0.16)
  2. Security (0.07)
  3. Backend (0.04)

Similar Articles

QuickGraph#17 The English WordNet in Neo4j (part 2)

By Jesús Barrasa - 2021-02-05

In this second post on WordNet on Neo4j I will be focusing on querying and analysing the graph that we created in the previous post. I'll leave for a third one more advanced analysis and integrations ...

Finding the Narrative with Natural Language Processing

By Medium - 2021-01-01

When I first started studying data science, one of the areas I was most excited to learn was natural language processing. “Unsupervised machine learning” certainly has a mystical ring to it, and…