Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too!

By arXiv.org - 2020-10-13

Description

Topic models are a useful analysis tool to uncover the underlying themes within document collections. The dominant approach is to use probabilistic topic models that posit a generative story, but in t ...

Summary

  • Tired of Topic Models?
  • Topic models are a useful analysis tool to uncover the underlying themes within document collections.
  • We provide benchmarks for the combination of different word embeddings and clustering algorithms, and analyse their performance under dimensionality reduction with PCA.
  • experimental projects with community collaborators arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

 

Topics

  1. NLP (0.19)
  2. UX (0.08)
  3. Backend (0.05)

Similar Articles

Finding the Narrative with Natural Language Processing

By Medium - 2021-01-01

When I first started studying data science, one of the areas I was most excited to learn was natural language processing. “Unsupervised machine learning” certainly has a mystical ring to it, and…