FastFormers: 233x Faster Transformers inference on CPU

By Medium - 2020-11-04

Description

Since the birth of BERT followed by that of Transformers have dominated NLP in nearly every language-related tasks whether it is Question-Answering, Sentiment Analysis, Text classification or Text…

Summary

  • 233x Faster Transformers inference on CPU Yes, 233x on CPU with the multi-head self-attentive Transformer architecture.
  • Transformers enjoys much better accuracy on all these tasks unlike RNN and LSTM the problem of vanishing gradients, which hampers learning of long data sequences.
  • As smaller models are less expensive to evaluate, they can be deployed on less powerful hardware like a smartphone.
  • In the taskspecific distillation, authors distill fine-tuned teacher models into smaller student architectures following the procedure proposed by TinyBERT.In the task-agnostic distillation approach, authors directly apply fine-tuning on general distilled models to tune for a specific task.

 

Topics

  1. Machine_Learning (0.21)
  2. NLP (0.2)
  3. Backend (0.07)

Similar Articles