You won't know which agent works best for you unless you benchmark your codebase.

Contact

Home

Company

Latest

Nicolas Maquet

21st January 2026

Codex CLI (GPT-5.2 Codex) is the most accurate we've tested, ranks #3 on Sigmabench

GPT-5.2 Codex takes over as the new Codex CLI flagship. We benchmark it against GPT-5.1 Codex Max and the now-legacy GPT-5.1 Codex to see what’s changed.

Codex CLI (GPT-5.2 Codex)

Sigmascore

32.4%

Accuracy

45.9%

Consistency

50.3%

Speed

14.8%

What we measured:

Sigmascore — the overall measure of an agent’s real-world coding performance
Accuracy — how often outputs meet quality thresholds
Consistency — how often outputs remain useful even when not fully completing a task
Speed — how quickly tasks are completed

Each score is assigned a tier based on how close they are with respect to the margin of error. The top-scoring group of agents are in Tier 1, the next-best group are in Tier 2, and so on.

Codex CLI (GPT-5.2 Codex) ranks #3 on the Sigmabench benchmark:

It has the highest Accuracy score, sharing Tier 1 with Codex CLI (GPT-5.1 Codex Max).
It is Tier 2 in Consistency, similar to other agent/model pairs in this category.
It shares Tier 4 in Speed with Gemini CLI (Gemini 3 Pro Preview).

See our methodology for additional details.

Comparisons

Codex CLI (GPT-5.2 Codex) vs Codex CLI (GPT-5.1 Codex Max)

GPT-5.2 Codex is the newest flagship agentic model for Codex CLI users, superseding GPT-5.1 Codex Max. In our benchmarks, both models clearly sit in Tier-1 accuracy and Tier-2 consistency, with GPT-5.2 Codex showing a marginal accuracy edge, and similar consistency.

Within Codex CLI, GPT-5.2 Codex shows modest but clear improvements in speed, outperforming 5.1 Codex Max by about 15% in median runtime (420s vs 494s). While GPT-5.2 Codex carries a roughly 40% higher per-token price, our benchmark runs consumed fewer tokens overall, resulting in a similar total run cost.

Metric

GPT-5.2 Codex

GPT-5.1 Codex Max

Sigmascore

32.4%

30.2%

Accuracy

45.9%

44.3%

Consistency

50.3%

49.9%

Speed

14.8%

12.5%

Codex CLI (GPT-5.2 Codex) vs Codex CLI (GPT-5.1 Codex)

Comparing our newest Codex CLI benchmark result to what we observed with GPT-5.1 Codex (now considered a legacy model) we can observe a significant performance gap.

Despite being released only weeks apart, the difference is striking: accuracy jumps from Tier 3 to Tier 1 (+5.7 points). Speed shows a similarly large separation, with GPT-5.2 Codex moving from Tier 6 to Tier 4 (+6.9 points).

Seeing gaps of this magnitude from what is nominally a point release underscores just how fast OpenAI is able to improve their models.

Metric

GPT-5.2 Codex

GPT-5.1 Codex

Sigmascore

33.9%

26.3%

Accuracy

45.9%

40.2%

Consistency

50.3%

40.3%

Speed

14.8%

7.9%

Key Insights

Codex CLI (GPT-5.2 Codex) takes the top spot on Accuracy, sharing Tier 1 with Codex CLI (GPT-5.1 Codex Max) while remaining Tier 2 on Consistency as well.
Codex CLI (GPT-5.2 Codex) is meaningfully faster than Codex CLI (GPT-5.1 Codex Max), cutting median runtime by ~15% (420s vs 494s).
The generation jump from Codex CLI (GPT-5.1 Codex) to Codex CLI (GPT-5.2 Codex) is dramatic: +5.7 points in Accuracy (Tier 3 → Tier 1) and +6.9 points in Speed (Tier 6 → Tier 4).

See our methodology for additional details.

Benchmarks are read-only and SOC 2-compliant.