Ophaelis Index

Methodology

This page explains how Ophaelis Index evaluates models, builds scores, and presents rankings in a consistent, auditable way.

What this measures

Ophaelis Index evaluates AI models on real-world tasks and ranks them using measurable performance dimensions:

  • Quality
  • Speed
  • Cost
  • Reliability

How models are evaluated

  • Each model receives the same prompts for each benchmark item.
  • The same scoring logic is applied to every model output.
  • Outputs are structured and scored programmatically for consistency.

Benchmark structure

Models are evaluated across multiple task categories that map to real operational work:

  • Structured extraction
  • Classification
  • Summarization
  • Coding

Each benchmark item represents a concrete scenario rather than an abstract benchmark toy problem.

Scoring system

Each model receives four component scores on each benchmark item:

  • Quality score
  • Speed score
  • Cost score
  • Reliability score

Final Score is a weighted combination of those components under the selected profile. Best Overall reflects the strongest tradeoff, not the highest raw capability.

Cost model (Level 2.5)

  • Cost is estimated using token-normalized pricing.
  • Tokens are approximated from text length.
  • Pricing uses published provider rates.

All models are evaluated using the same cost estimation method for fairness.

Current limitations

  • Current runs are single-pass per benchmark item.
  • Token counts are estimated for consistency.
  • Result confidence improves as historical runs accumulate.

What's next

  • Repeated-run evaluation (x3 sampling).
  • Historical performance tracking.
  • Consistency and variance metrics.