Do transformers need three projections? Systematic study of QKV variants
transformersQKV attentionprojection sharingmachine learningnatural language processing
Author: Anon84
Date: 6/4/2026
Article Summary:
This paper explores the necessity of three projections in the query, key, and value (QKV) attention formulation of transformers, proposing variants with shared projections and evaluating their performance on various tasks.