Speech to Text with Wav2Vec

By KDnuggets - 2021-03-02

Description

Facebook recently introduced and open-sourced their new framework for self-supervised learning of representations from raw audio data called Wav2Vec 2.0. Learn more about it and how to use it here.

Summary

  • In my previous blog, I explained how to convert speech into text using the Speech Recognition library with the help of Google speech recognition API.
  • Hugging Face) Reading the audio file I have used Liam Neeson famous dialogue audio clip from the movie “Taken” in this example which says “I will look for you, I will find you and I will kill you” Please note the Wav2Vec model is pre-trained on 16 kHz frequency, so we make sure our raw audio file is also resampled to a 16 kHz sampling rate.
  • In this blog, we have seen how to convert speech into text using Wav2Vec pretrained model using Transformers.
  • Dhilip Subramanian is a Mechanical Engineer and has completed his Master's in Analytics.

 

Topics

  1. NLP (0.42)
  2. Backend (0.13)
  3. Machine_Learning (0.12)

Similar Articles

How to build a fraud detection solution

By Google Cloud Blog - 2021-03-03

In collaboration with our partner Quantiphi, we developed a smart analytics design pattern that enables you to build a scalable real-time fraud detection solution in one hour using serverless, no-ops ...

The Model’s Shipped; What Could Possibly go Wrong

By Medium - 2021-02-18

In our last post we took a broad look at model observability and the role it serves in the machine learning workflow. In particular, we discussed the promise of model observability & model monitoring…

How to Use AutoKeras for Classification and Regression

By Machine Learning Mastery - 2020-09-01

AutoML refers to techniques for automatically discovering the best-performing model for a given dataset. When applied to neural networks, this involves both discovering the model architecture and the ...