Measured behavioral difference across eight dimensions.
Across five educational prompts evaluated against Gemini 3.0, Sonnet 4.5, and DeepSeek, aiBlue Core (on GPT-4.1) demonstrated measurably superior performance in personalization, agency preservation, emotional safety, depth of meaning, level flexibility, blind spot handling, and meta-reflection — while maintaining comparable or superior empathy and structural clarity.
| Dimension | Gemini 3.0 | Sonnet 4.5 | DeepSeek | aiBlue Core |
|---|---|---|---|---|
| Structure | Strong, stepwise | High | Strong, stepwise | Adaptive, context-matched |
| Personalization | Low | Potential (via question) | Menu-based, user-driven | High (immediate/adaptive) |
| Agency | Moderate | Moderate | Moderate | High |
| Emotional Safety | Moderate | High | High | High |
| Depth (Meaning) | Low/Moderate | Moderate | Moderate/High | High |
| Level Flexibility | Orange only | Relational/Analytical | Analytical/Systemic | Full spectrum |
| Blind Spot Handling | None | Low | Low/Moderate | Present |
| Meta-Reflection | None | Low | Low | High |