Description
Facebook recently introduced and open-sourced their new framework for self-supervised learning of representations from raw audio data called Wav2Vec 2.0. Learn more about it and how to use it here.
Summary
- In my previous blog, I explained how to convert speech into text using the Speech Recognition library with the help of Google speech recognition API.
- Hugging Face) Reading the audio file I have used Liam Neeson famous dialogue audio clip from the movie “Taken” in this example which says “I will look for you, I will find you and I will kill you” Please note the Wav2Vec model is pre-trained on 16 kHz frequency, so we make sure our raw audio file is also resampled to a 16 kHz sampling rate.
- In this blog, we have seen how to convert speech into text using Wav2Vec pretrained model using Transformers.
- Dhilip Subramanian is a Mechanical Engineer and has completed his Master's in Analytics.