Description
Why do we hear so little about transformer models applied to computer vision tasks? What about attention in computer vision networks?
Summary
- Transformers Are for Natural Language Processing (NLP), Right?
- Transformers are the most visible and impactful application of attention in machine learning and, while transformers have mostly been used in NLP, the biological inspiration for attention is loosely based on the vision systems of animals.
- Instead, deep learning researchers and engineers working in computer vision scrambled to collect the arguably lower hanging fruit of increasingly deep convolutional neural networks and other architectural tweaks.
- used recurrent connections (in the form of an LSTM head) in their caption generation task, and the attention mechanism was a little different from the dot product attention mechanism used by Vaswani et al.