Best Overall Balance
GPT 5.4 Mini
OpenAI
90.54
Composite score calculated from quality, speed, cost, and reliability under the Balanced ranking profile.
Best combined tradeoff across quality, speed, cost, and reliability.
OPHAELIS INDEX
Compare leading models on real benchmark tasks, see tradeoffs instantly, and move from top-line verdicts to deep evidence without leaving one product surface.
Best Overall
Strongest tradeoff across quality, speed, cost, and reliability under the selected ranking profile.
Final Score
Composite ranking score out of 100. It reflects weighted tradeoffs, not raw capability alone.
Estimated Cost
Normalized token-based estimate using published provider pricing for apples-to-apples comparison.
Current System Verdict (Balanced)
OpenAI
90.54
Composite score calculated from quality, speed, cost, and reliability under the Balanced ranking profile.
GPT 5.4 Mini wins 7 decision areas and sustains a typical 1.18-point margin over the runner-up in those lanes.
Decision Areas Evaluated
10
Limited Coverage Areas
6
Last Evaluator Run
Apr 2, 2026, 1:46 PM
Public quick views for different operating goals
Best Overall Balance
OpenAI
90.54
Composite score calculated from quality, speed, cost, and reliability under the Balanced ranking profile.
Best combined tradeoff across quality, speed, cost, and reliability.
Highest Output Quality
93.00
Composite score calculated from quality, speed, cost, and reliability under the Quality First ranking profile.
Leads when answer quality and completeness are prioritized.
Fastest Response
90.95
Composite score calculated from quality, speed, cost, and reliability under the Speed First ranking profile.
Leads for latency-sensitive use cases and faster turnaround.
Most Cost Efficient
93.90
Composite score calculated from quality, speed, cost, and reliability under the Cost First ranking profile.
Leads when normalized spend efficiency is weighted highest.
Most Consistent
93.99
Composite score calculated from quality, speed, cost, and reliability under the Reliability First ranking profile.
Leads under reliability-first weighting for dependable outputs.
Tracks top balanced-system score across evaluator runs. Automatically handles sparse history.
Sortable free view for quick model comparison
Composite score calculated from quality, speed, cost, and reliability under the Balanced ranking profile.
| Model | Provider | Final Score | Q | S | C | R | Wins |
|---|---|---|---|---|---|---|---|
| 1. GPT 5.4 Mini | OpenAI | 90.54 | 90.5 | 86.9 | 85.4 | 97.8 | 7 |
| 2. Gemini 3.1 Pro Preview | 89.84 | 98.0 | 84.8 | 64.2 | 95.0 | 0 | |
| 3. Gemini 2.5 Flash Lite | 89.64 | 83.3 | 81.5 | 100.0 | 100.0 | 0 | |
| 4. Gemini 3 Flash | 89.64 | 88.5 | 95.8 | 83.2 | 91.7 | 0 | |
| 5. Claude Haiku 4 5 | Anthropic | 88.14 | 84.3 | 87.6 | 91.7 | 94.4 | 0 |
| 6. Mistral Medium | Other frontier | 87.96 | 84.1 | 73.8 | 97.8 | 100.0 | 2 |
| 7. Mistral Large | Other frontier | 87.94 | 89.5 | 88.8 | 79.2 | 90.3 | 0 |
| 8. GPT 5.4 Nano | OpenAI | 87.61 | 82.2 | 86.8 | 96.9 | 93.8 | 1 |
| 9. Mistral Fast | Other frontier | 87.29 | 82.5 | 96.8 | 91.2 | 86.7 | 0 |
| 10. GPT 5.4 | OpenAI | 86.72 | 93.8 | 73.6 | 67.1 | 99.4 | 0 |
Deeper Access
Open the deeper view for full tables, decision-area breakdowns, per-model details, and cost transparency diagnostics.
Access Member ViewMethodology