Description
An intuitive understanding on Transformers and how they are used in Machine Translation. After analyzing all subcomponents one by one such as self-attention and positional encodings , we explain the p ...
Summary
- How Transformers work in deep learning and NLP: Masked Multi-head attention In case you haven’t realized, in the decoding stage, we predict one word (token) after another.
- where the magic happens This is actually where the decoder processes the encoded representation.
- You see the values of the self-attention weights are computed on the fly.