Description
Why tf-idf sometimes fails to accurately capture word importance, and what we can use instead
Summary
- Getting Started What’s in a word?
- As I mentioned above, this is a simple yet powerful tool, and gives generally good estimates of which words define documents in a corpus.
- To see this in action, consider the same setup from above, this time with apple making up 10% of words in document A.
- If apple did not appear at all in document B, the tf-idf value would be relatively high.
- However, as we can see in the table, the tf-idf value is actually zero, suggesting the word is not at all unique or important to the album.