Description
Posted by Krzysztof Choromanski and Lucy Colwell, Research Scientists, Google Research Transformer models have achieved state-of-the-art...
Summary
- Posted by Krzysztof Choromanski and Lucy Colwell, Research Scientists, Google Research Transformer models have achieved state-of-the-art results across a diverse range of domains, including natural language, conversation, images, and even music.
- To the best of our knowledge, we are the first to show that any attention matrix can be effectively approximated in downstream Transformer-applications using random features.
- Properties We first benchmark the space- and time-complexity of the Performer and show that the attention speedups and memory reductions are empirically nearly optimal, i.e., very close to simply not using an attention mechanism at all in the model.
- The Performer model is nearly able to reach this optimal performance in the attention component.