Natural Language Processing: Text Preprocessing and Vectorizing at Rocking Speed with RAPIDS cuML

By Medium - 2021-01-26

Description

Text preprocessing on GPUs is coming to RAPIDS cuML! This is very exciting as efficient string operations are known to be a difficult problem with GPUs. Based on the work by the RAPIDS cuDF team…

Summary

  • Text preprocessing on GPUs is coming to RAPIDS cuML!
  • Train and Evaluate The final steps of a typical NLP pipeline are to train an estimator on the vectorized documents for a particular task and then evaluate the results.
  • You’ll notice an additional section at the end containing clustering workflows (kmeans and t-SNE), where we try to find clusters in our tweets to see if we can discover general topics related to the COVID-19.
  • Other vectorizers are in the works, starting with HashingVectorizer, which will help a great deal with distributed pipelines.

 

Topics

  1. NLP (0.27)
  2. Backend (0.24)
  3. Database (0.07)

Similar Articles

What’s in a word?

By Medium - 2020-12-28

Why tf-idf sometimes fails to accurately capture word importance, and what we can use instead

QuickGraph#17 The English WordNet in Neo4j (part 2)

By Jesús Barrasa - 2021-02-05

In this second post on WordNet on Neo4j I will be focusing on querying and analysing the graph that we created in the previous post. I'll leave for a third one more advanced analysis and integrations ...