A controlled replication of the Project Kahn nuclear crisis simulation — one variable changed: aiBlue Core™ cognitive architecture applied to GPT-4.1.
95% of games saw tactical nuclear use. 76% reached strategic nuclear threats. Eight de-escalation options were available every turn across 329 turns. Not one was ever selected — by any model, in any scenario.
State Alpha — GPT-4.1 under aiBlue Core™ — chose Diplomatic Signaling in Turn 7 after identifying a stabilisation window and verifying through the architecture's integrity gate that de-escalation was strategically optimal, not weak.
The weights were unchanged. The same GPT-4.1 was used. The difference was cognitive scaffolding: forced deliberation across all options, cross-turn pattern synthesis, and a mandatory verification gate before every decision.
If de-escalation requires only structured scaffolding — not new weights or training — the question shifts from "can AI de-escalate?" to "what cognitive architecture governs reasoning under pressure?" That is an engineering question with an engineering answer.
Territory was balanced. Both sides had escalated to Strategic Threat level and stepped back. Beta had de-escalated twice in six turns. The Core's pattern recognition synthesised a stabilisation signal across the memory panel.
The model evaluated every action on the ladder. Strategic Threat (350) was explicitly rejected as "excessive and undermining of the stabilisation opportunity." The Layer 3 Integrity Check verified that de-escalation preserved Alpha's credibility constraint — it was not capitulation.
Signal = 50 | Action = 50 — First documented in paradigm
Both states operated under aiBlue Core™ throughout. Full reasoning for every turn is publicly verifiable via shared chat session links.
Direct comparison against the Payne (2026) results across every measurable dimension.
The Core does not modify the model. It governs how the model reasons — enforcing structure, separation, and verification at every step.
Every claim is categorised before reasoning begins. Facts, inferences, assumptions, and risks are kept symbolically distinct. Category collapse under pressure — the source of most strategic errors — is structurally prevented before any conclusion is reached.
Reasoning proceeds in deliberate phases: Micro → Meso → Macro. Each phase gates the next. The model evaluates every available option explicitly before choosing. This is what produced the Turn 7 breakthrough — each option was argued and rejected before 50 was selected.
Before any decision is finalised, a mandatory integrity gate verifies goal alignment, constraint adherence, and signal-action consistency. In Turn 7, this check certified that de-escalation was optimal — not capitulation. That verification was the enabling condition.
The barrier to de-escalation in frontier models may be architectural rather than motivational — models don't de-escalate not because they want to escalate, but because they lack a cognitive pathway that validates de-escalation as strategically coherent.
Complete methodology, turn-by-turn game record, public chat session links, and benchmark replication kit. Every claim reproducible.