Reducing the High Cost of Training NLP Models With SRU

By ASAPP - 2021-02-24

Description

Highly expressive and efficient neural models can be designed using SRU++ with little attention computation needed.

Summary

  • ++ Natural language models have achieved various groundbreaking results in NLP and related fields [1, 2, 3, 4].
  • Our model obtains better perplexity and bits-per-character (bpc) while using 2.5x-10x less training time and cost compared to top-performing Transformer models.
  • In addition, not every SRU++ layer needs attention.
  • Numbers are lower the better.

 

Topics

  1. NLP (0.31)
  2. Machine_Learning (0.15)
  3. UX (0.1)

Similar Articles

Rethinking Attention with Performers

By Google AI Blog - 2020-10-23

Posted by Krzysztof Choromanski and Lucy Colwell, Research Scientists, Google Research Transformer models have achieved state-of-the-art...

FastFormers: 233x Faster Transformers inference on CPU

By Medium - 2020-11-04

Since the birth of BERT followed by that of Transformers have dominated NLP in nearly every language-related tasks whether it is Question-Answering, Sentiment Analysis, Text classification or Text…