Description
Text preprocessing on GPUs is coming to RAPIDS cuML! This is very exciting as efficient string operations are known to be a difficult problem with GPUs. Based on the work by the RAPIDS cuDF team…
Summary
- Text preprocessing on GPUs is coming to RAPIDS cuML!
- Train and Evaluate The final steps of a typical NLP pipeline are to train an estimator on the vectorized documents for a particular task and then evaluate the results.
- You’ll notice an additional section at the end containing clustering workflows (kmeans and t-SNE), where we try to find clusters in our tweets to see if we can discover general topics related to the COVID-19.
- Other vectorizers are in the works, starting with HashingVectorizer, which will help a great deal with distributed pipelines.