Design & Branding

Multi-head attention heatmaps

oleh wuyoscar

Multi-head attention heatmaps
Landscape 16:9 figure of 4 attention heatmaps (2×2 grid), shared 12-token input.

Token labels across X and Y (rotated 45° on X): "The", "quick", "brown", "fox", "jumped", "over", "the", "lazy", "dog", "near", "the", "river".

Four 12×12 cell panels with individual titles:
"Layer 6, Head 3 — subject-verb" (highlighted cells between "fox"/"jumped")
"Layer 9, Head 7 — coreference" (highlighted cells between "the"(×2)/"river")
"Layer 11, Head 2 — prepositional" (highlighted cells between "over"/"dog", "near"/"river")
"Layer 14, Head 1 — sentence-final" (activity concentrated in rightmost column)

Cells: dusty-teal gradient, darker = higher weight. Peak cells outlined in 1px soft-terracotta. Shared vertical color bar on far right with ticks "0.0", "0.25", "0.5", "0.75", "1.0" and label "attention weight".

Title: "Representative multi-head attention patterns in a 16-layer Transformer". Subtitle: "four of 256 heads, hand-picked for illustrative head-role diversity; inspired by Clark et al., 2019."
Multi-head attention heatmaps | NeXra AI