Vision Transformers: Natural Language Processing (NLP) Increases Efficiency and Model Generality

By KDnuggets - 2021-03-23

Description

Why do we hear so little about transformer models applied to computer vision tasks? What about attention in computer vision networks?

Summary

  • Transformers Are for Natural Language Processing (NLP), Right?
  • Transformers are the most visible and impactful application of attention in machine learning and, while transformers have mostly been used in NLP, the biological inspiration for attention is loosely based on the vision systems of animals.
  • Instead, deep learning researchers and engineers working in computer vision scrambled to collect the arguably lower hanging fruit of increasingly deep convolutional neural networks and other architectural tweaks.
  • used recurrent connections (in the form of an LSTM head) in their caption generation task, and the attention mechanism was a little different from the dot product attention mechanism used by Vaswani et al.

 

Topics

  1. Machine_Learning (0.33)
  2. NLP (0.22)
  3. Backend (0.11)

Similar Articles

Rethinking Attention with Performers

By Google AI Blog - 2020-10-23

Posted by Krzysztof Choromanski and Lucy Colwell, Research Scientists, Google Research Transformer models have achieved state-of-the-art...

lucidrains/vit-pytorch

By GitHub - 2020-10-05

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch - lucidrains/vit-pytorch

CLIP: Connecting Text and Images

By OpenAI - 2021-01-05

We’re introducing a neural network called CLIP which efficiently learns visual concepts from natural language supervision.