OPHAELIS INDEX

Decision intelligence for production model selection

Compare leading models on real benchmark tasks, see tradeoffs instantly, and move from top-line verdicts to deep evidence without leaving one product surface.

Free Decision SurfaceLatest Update: Apr 2, 2026, 1:46 PMLive Lanes: 4Limited Coverage: 6

How to read this

Best Overall

Strongest tradeoff across quality, speed, cost, and reliability under the selected ranking profile.

Final Score

Composite ranking score out of 100. It reflects weighted tradeoffs, not raw capability alone.

Estimated Cost

Normalized token-based estimate using published provider pricing for apples-to-apples comparison.

Current System Verdict (Balanced)

GPT 5.4 Mini

OpenAI

90.54

Composite score calculated from quality, speed, cost, and reliability under the Balanced ranking profile.

GPT 5.4 Mini wins 7 decision areas and sustains a typical 1.18-point margin over the runner-up in those lanes.

Decision Areas Evaluated

10

Limited Coverage Areas

6

Last Evaluator Run

Apr 2, 2026, 1:46 PM

Top Models by Decision Priority

Public quick views for different operating goals

Best Overall Balance

GPT 5.4 Mini

OpenAI

90.54

Composite score calculated from quality, speed, cost, and reliability under the Balanced ranking profile.

Best combined tradeoff across quality, speed, cost, and reliability.

Highest Output Quality

Gemini 3.1 Pro Preview

Google

93.00

Composite score calculated from quality, speed, cost, and reliability under the Quality First ranking profile.

Leads when answer quality and completeness are prioritized.

Fastest Response

Gemini 3 Flash

Google

90.95

Composite score calculated from quality, speed, cost, and reliability under the Speed First ranking profile.

Leads for latency-sensitive use cases and faster turnaround.

Most Cost Efficient

Gemini 2.5 Flash Lite

Google

93.90

Composite score calculated from quality, speed, cost, and reliability under the Cost First ranking profile.

Leads when normalized spend efficiency is weighted highest.

Most Consistent

Gemini 2.5 Flash Lite

Google

93.99

Composite score calculated from quality, speed, cost, and reliability under the Reliability First ranking profile.

Leads under reliability-first weighting for dependable outputs.

Performance Trend

Tracks top balanced-system score across evaluator runs. Automatically handles sparse history.

Trend history is building. Run additional evaluator cycles to unlock charted movement.

Compact Leaderboard

Sortable free view for quick model comparison

Composite score calculated from quality, speed, cost, and reliability under the Balanced ranking profile.

ModelProviderFinal ScoreQSCRWins
1. GPT 5.4 MiniOpenAI90.5490.586.985.497.87
2. Gemini 3.1 Pro PreviewGoogle89.8498.084.864.295.00
3. Gemini 2.5 Flash LiteGoogle89.6483.381.5100.0100.00
4. Gemini 3 FlashGoogle89.6488.595.883.291.70
5. Claude Haiku 4 5Anthropic88.1484.387.691.794.40
6. Mistral MediumOther frontier87.9684.173.897.8100.02
7. Mistral LargeOther frontier87.9489.588.879.290.30
8. GPT 5.4 NanoOpenAI87.6182.286.896.993.81
9. Mistral FastOther frontier87.2982.596.891.286.70
10. GPT 5.4OpenAI86.7293.873.667.199.40

Why teams trust this ranking

  • Models are evaluated on real benchmark tasks that reflect production work.
  • Rankings combine quality, speed, cost, and reliability into a composite score.
  • Estimated cost uses normalized token-based pricing for apples-to-apples comparison.
  • Scores expose measurable tradeoffs across quality, speed, cost, and reliability.
  • Current results are single-pass benchmark runs, with trend confidence improving as run history accumulates.

Deeper Access

Member Data Layer

Open the deeper view for full tables, decision-area breakdowns, per-model details, and cost transparency diagnostics.

Access Member ViewMethodology