Vaswani et al, 2017: “Attention is all you need”
The Transformer is a deep learning model introduced in 2017 by Google researchers. It was primarily designed for natural language processing (NLP) tasks, such as translation and text summarization. Unlike previous models (RNNs and LSTMs) that processed data sequentially, the Transformer uses an attention mechanism to process the entire input at once. This allows for parallelization and captures long-range dependencies more effectively.
Many State of the art specific architectures such as BERT, GPT-2, GPT-3 are based on this architecture. Transformers have also been applied to fields outside NLP e.g in Computer Vision (Vision Transformers).
The goal of this tutorial series is to give a detailed description of the components of the transformer architecture.
The tutorial series is divided into the following parts: