facebook/wav2vec2-base-960h · Hugging Face

By huggingface - 2021-02-08

Description

We’re on a journey to solve and democratize artificial intelligence through natural language.

Summary

  • We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.
  • wav2vec 2.0 masks the speech input in the latent space and solves a contrastive task defined over a quantization of the latent representations which are jointly learned.
  • Experiments using all labeled data of Librispeech achieve 1.8/3.3 WER on the clean/other test sets.
  • This demonstrates the feasibility of speech recognition with limited amounts of labeled data.

 

Topics

  1. Backend (0.3)
  2. Database (0.15)
  3. Machine_Learning (0.14)

Similar Articles

The Growing Importance of Metadata Management Systems

By Gradient Flow - 2021-02-02

Metadata will be the foundation for data governance solutions, data catalogs, and other enterprise data systems. By Assaf Araki and Ben Lorica. Introduction As companies embrace digital technologie…

15 Essential Steps To Build Reliable Data Pipelines

By Medium - 2020-12-01

If I learned anything from working as a data engineer, it is that practically any data pipeline fails at some point. Broken connection, broken dependencies, data arriving too late, or some external…