How Transformers work in deep learning and NLP: an intuitive introduction

By AI Summer - 2020-12-24

Description

An intuitive understanding on Transformers and how they are used in Machine Translation. After analyzing all subcomponents one by one such as self-attention and positional encodings , we explain the p ...

Summary

  • How Transformers work in deep learning and NLP: Masked Multi-head attention In case you haven’t realized, in the decoding stage, we predict one word (token) after another.
  • where the magic happens This is actually where the decoder processes the encoded representation.
  • You see the values of the self-attention weights are computed on the fly.

 

Topics

  1. NLP (0.35)
  2. Machine_Learning (0.16)
  3. Backend (0.11)

Similar Articles

Attention mechanism in Deep Learning, Explained

By KDnuggets - 2021-02-09

Attention is a powerful mechanism developed to enhance the performance of the Encoder-Decoder architecture on neural network-based machine translation tasks. Learn more about how this process works an ...

Rethinking Attention with Performers

By Google AI Blog - 2020-10-23

Posted by Krzysztof Choromanski and Lucy Colwell, Research Scientists, Google Research Transformer models have achieved state-of-the-art...

pytorch-widedeep: deep learning for tabular data

By Medium - 2021-02-22

This is the third of a series of posts introducing pytorch-widedeepa flexible package to combine tabular data with text and images (that could also be used for “standard” tabular data alone). The…

Sentiment Analysis With Long Sequences

By Medium - 2021-03-10

Sentiment analysis is typically limited by the length of text that can be processed by transformer models like BERT. We will learn how to work around this.