Publication of "Attention Is All You Need"
Eight researchers from Google Brain, Google Research, and the University of Toronto publish "Attention Is All You Need" (arXiv:1706.03762), introducing the Transformer architecture — a neural network built entirely on self-attention mechanisms, dispensing with recurrence and convolution. The paper proposes multi-head attention, scaled dot-product attention, and sinusoidal positional encodings, achieving state-of-the-art results on WMT 2014 English-to-German and English-to-French translation benchmarks while training significantly faster than prior architectures. Presented at NeurIPS 2017 in Long Beach, California, the paper would become the most cited work in deep learning history (over 140,000 citations by 2024), winning the NeurIPS 2023 Test of Time Award. The Transformer directly spawned BERT, GPT, T5, PaLM, LLaMA, and virtually every large language model that followed, making it the single most consequential architecture paper in the history of artificial intelligence.