Multi-Head-Attention
-
Multi-Head Attention
Multiple parallel attention mechanisms operating on different subspaces; enables learning diverse interaction patterns simultaneously.
Multiple parallel attention mechanisms operating on different subspaces; enables learning diverse interaction patterns simultaneously.