Description
Highly expressive and efficient neural models can be designed using SRU++ with little attention computation needed.
Summary
- ++ Natural language models have achieved various groundbreaking results in NLP and related fields [1, 2, 3, 4].
- Our model obtains better perplexity and bits-per-character (bpc) while using 2.5x-10x less training time and cost compared to top-performing Transformer models.
- In addition, not every SRU++ layer needs attention.
- Numbers are lower the better.