Natural Language Processing: Text Preprocessing and Vectorizing at Rocking Speed with RAPIDS cuML

By Medium - 2021-01-26

Description

Text preprocessing on GPUs is coming to RAPIDS cuML! This is very exciting as efficient string operations are known to be a difficult problem with GPUs. Based on the work by the RAPIDS cuDF team…

Summary

Text preprocessing on GPUs is coming to RAPIDS cuML!
Train and Evaluate The final steps of a typical NLP pipeline are to train an estimator on the vectorized documents for a particular task and then evaluate the results.
You’ll notice an additional section at the end containing clustering workflows (kmeans and t-SNE), where we try to find clusters in our tweets to see if we can discover general topics related to the COVID-19.
Other vectorizers are in the works, starting with HashingVectorizer, which will help a great deal with distributed pipelines.

Topics

NLP (0.27)
Backend (0.24)
Database (0.07)

Similar Articles

Calculating Document Similarities using BERT, word2vec, and other models

By Medium - 2020-12-03

Document similarities is one of the most crucial problems of NLP. Finding similarity across documents is used in several domains such as recommending similar books and articles, identifying…

What’s in a word?

By Medium - 2020-12-28

Why tf-idf sometimes fails to accurately capture word importance, and what we can use instead

Google Cloud announces Document AI Platform

By Google Cloud Blog - 2020-12-27

Document AI Platform is a unified console for document processing in the cloud.

Parsing and Mapping a Docx file with Java

By hackernoon - 2021-02-19

First, we will extract the docx archive. Next, we will read and map the file word/document.xml to a Java object.

What Is the Best Input Pipeline to Train Image Classification Models with tf.keras

By Medium - 2021-02-04

When we start learning how to build deep neural networks with Keras, the first method we use to input data is simply loading it into NumPy arrays. At some point, especially when working with images…

QuickGraph#17 The English WordNet in Neo4j (part 2)

By Jesús Barrasa - 2021-02-05

In this second post on WordNet on Neo4j I will be focusing on querying and analysing the graph that we created in the previous post. I'll leave for a third one more advanced analysis and integrations ...