Easy Speech Recognition with Machine Learning and HuggingFace Transformers

By MachineCurve - 2021-02-17

Description

Learn how to use Transformer architecture to create an easy Speech Recognition / Speech to Text pipeline with Python. Includes examples.

Summary

  • Transformer architectures have gained a lot of attention in the field of Natural Language Processing.
  • Combined with the benefits resulting from their architecture (i.e.
  • attention is all you need, and no sequential processing is necessary), very large models (like BERT or the GPT series) have been trained that achieve state-of-the-art performance on a variety of language tasks.
  • A feature encoder in the form of a 1D/temporal ConvNet with 7 layers takes the waveform and converts it into T time steps.
  • Using an .mp3 file, converted into .wav The pipeline that we will be creating today requires you to use .wav files, and more specifically .wav files with a sampling rate of 16000 Hz (16 kHz).

 

Topics

  1. NLP (0.39)
  2. Machine_Learning (0.21)
  3. Backend (0.16)

Similar Articles

K-fold Cross Validation with PyTorch

By MachineCurve - 2021-02-02

Explanations and code examples showing you how to use K-fold Cross Validation for Machine Learning model evaluation/testing with PyTorch.

How to Use AutoKeras for Classification and Regression

By Machine Learning Mastery - 2020-09-01

AutoML refers to techniques for automatically discovering the best-performing model for a given dataset. When applied to neural networks, this involves both discovering the model architecture and the ...

Visualizing Keras neural networks with Net2Vis and Docker

By MachineCurve - 2020-01-07

Visualizing the structure of your neural network is quite useful for publications, such as papers and blogs. Today, various tools exist for generating these visualizations – allowing engineers and res ...