← Back to Blog
· 1 min read

Why Transformers Use Multi-Head Attention

Why Transformers Use Multi-Head Attention