FastFormers: 233x Faster Transformers inference on CPU

By Medium - 2020-11-04

Description

Since the birth of BERT followed by that of Transformers have dominated NLP in nearly every language-related tasks whether it is Question-Answering, Sentiment Analysis, Text classification or Text…

Summary

233x Faster Transformers inference on CPU Yes, 233x on CPU with the multi-head self-attentive Transformer architecture.
Transformers enjoys much better accuracy on all these tasks unlike RNN and LSTM the problem of vanishing gradients, which hampers learning of long data sequences.
As smaller models are less expensive to evaluate, they can be deployed on less powerful hardware like a smartphone.
In the taskspecific distillation, authors distill fine-tuned teacher models into smaller student architectures following the procedure proposed by TinyBERT.In the task-agnostic distillation approach, authors directly apply fine-tuning on general distilled models to tune for a specific task.

Topics

Machine_Learning (0.21)
NLP (0.2)
Backend (0.07)

FastFormers: 233x Faster Transformers inference on CPU

Description

Summary

Topics

Similar Articles

Introducing ABENA: BERT Natural Language Processing for Twi

Accelerating Neural Networks on Mobile and Web with Sparse Inference

So, your stakeholders want an interpretable Machine Learning model?

Retrieval Augmented Generation: Streamlining the creation of intelligent natural language processing models

Interpretability, Explainability, and Machine Learning – What Data Scientists Need to Know

Key Concepts for Deploying Machine Learning Models to Mobile

Feedback

Bookmarks

Similar Readings