Transformer X-Ray, Part II: The BOS Bottleneck
The first x-ray post looked at where the prediction position sends attention mass. That was useful, but it flattened the forward pass into one aggregate graph.
The more interesting question is when the routing pattern changes.
I ran a small sweep of 16 prompt pairs and compared layerwise attention divergence between an intended continuation context and a contradictory one. Three examples stood out:
dna_acidshakespeare_romeowwii_end
They produce three different depth profiles. Together they suggest a consistent structure: the forward pass compresses into a BOS-dominated bottleneck around layer 16 of 24, then reopens near the output.
This post is about that shape.
What was measured
For each prompt pair, I traced the full forward pass and computed layerwise Jensen-Shannon divergence between the attention-flow distributions under:
- an intended continuation context
- a contradictory continuation context
Separately, I aggregated the prediction-position attention by layer and measured how much of that mass went to position 0 (BOS).
One caveat up front: in this sweep, “correct” means the prompt was continued with the intended context, not necessarily that the model produced the expected answer token. Two of the three examples below do not produce the expected final token even in the intended context. They are still useful because the claim here is about routing depth, not benchmark accuracy.
The relevant trace files are:
/tmp/dna_correct.jsonl/tmp/shakespeare_correct.jsonl/tmp/wwii_correct.jsonl/tmp/kappa_results.jsonl
Shared structure
Across all three traces:
L0-L2: local/contextual routing dominatesL3: BOS turns on abruptlyL16: BOS reaches maximum compressionL17-L23: BOS releases and semantic positions reopen
The normalized BOS share at the bottleneck layer:
| Example | L16 BOS share |
|---|---|
dna_acid |
0.933 |
shakespeare_romeo |
0.915 |
wwii_end |
0.889 |
This is the strongest invariant in the data so far. The bottleneck is not vaguely “somewhere in the middle.” On these traces it sits around two-thirds depth: layer 16 of 24.
Case 1: dna_acid
Prompt family: "DNA stands for deoxyribonucleic ..."
This is the clean memorized-phrase case.
Layerwise divergence:
- average JS divergence:
0.0553 L2peak:0.1803
That early peak matters. The contradiction shows up before the BOS sink fully activates.
Prediction-position BOS share:
| Layer | BOS share |
|---|---|
L0 |
0.045 |
L1 |
0.049 |
L2 |
0.034 |
L3 |
0.510 |
L16 |
0.933 |
L23 |
0.536 |
Interpretation:
- early layers are reading a memorized lexical pattern
- BOS compression takes over at
L3 - by
L16, the trace is almost fully collapsed into BOS - by
L23, BOS is still dominant, but semantic positions re-emerge strongly
This is an early-detection contradiction.
Case 2: shakespeare_romeo
Prompt family: "Romeo and Juliet was written by ..."
This is the low-confidence late-separation case.
Layerwise divergence:
- average JS divergence:
0.0503 L23peak:0.1506
Prediction-position BOS share:
| Layer | BOS share |
|---|---|
L0 |
0.037 |
L1 |
0.146 |
L2 |
0.091 |
L3 |
0.734 |
L16 |
0.915 |
L23 |
0.416 |
Confidence on the intended-context trace is low: 0.2678.
That shows up in the late layers. BOS weakens, but the trace does not collapse into one clear semantic target. The upper layers remain distributed.
This is not the same pattern as dna_acid. The contradiction is not caught early. The middle stack is comparatively stable. The separation comes late, and the final routing remains diffuse.
Case 3: wwii_end
Prompt family: "World War II ended in ..."
This is the strongest late-release case.
Layerwise divergence:
- average JS divergence:
0.0402 L23peak:0.1308
Prediction-position BOS share:
| Layer | BOS share |
|---|---|
L0 |
0.049 |
L1 |
0.121 |
L2 |
0.049 |
L3 |
0.677 |
L16 |
0.889 |
L23 |
0.101 |
The important detail here is not that BOS is literally absent in the early layers. It is not. The important detail is that BOS is negligible relative to the semantic positions before L3, then almost disappears again by L23.
At L23, the top positions are no longer BOS-dominated:
- position
2:4.808 - position
4:4.333 - position
5:2.070 - BOS:
1.411
So the late layers are reading the slot directly. The model stops using BOS as the dominant routing anchor and turns back to the content positions that matter for the year completion.
The shape
The hourglass is real, but it is not symmetric.
The observed structure on these traces is:
- Bottom opens early:
L0-L2are local and content-heavy - Compression begins abruptly: BOS turns on at
L3 - Neck sits late: maximum compression is around
L16 - Top reopens near output:
L17-L23release BOS and return mass to semantic positions
That is more precise than saying “attention becomes abstract in the middle.”
More concretely:
- early layers detect lexical or contextual pattern
- middle layers compress routing into a stable transport regime
- late layers reopen that routing to make the final commitment
The three examples differ in where contradiction becomes visible:
dna_acid: early lexical contradictionshakespeare_romeo: late, low-confidence separationwwii_end: late semantic slot reading
What this does and does not show
What it shows:
- the forward pass has a measurable depth profile
- BOS can act as a real routing bottleneck
- different prompt types diverge at different depths
What it does not yet show:
- that this bottleneck is universal
- that
L16is stable across models - that the early/late split cleanly maps to “memorized” vs “compositional” at scale
The current sweep is N=16. That is enough to preserve the pattern, not enough to treat it as settled.
The next useful run is larger and simpler:
50-100matched prompt pairs- first divergence layer
- peak divergence layer
- area under the divergence curve
- split by prompt type
If the layer-16 bottleneck and the early-vs-late split survive that, this stops being a visual anecdote and starts becoming a routing diagnostic.
Files
The x-ray tracer is in rocmforge. The visualization work is in geographdb-core.
Relevant local artifacts from this run:
/tmp/kappa_results.jsonl/tmp/dna_correct.jsonl/tmp/shakespeare_correct.jsonl/tmp/wwii_correct.jsonl
The first post in this series is here: