On our first full real-world evaluation run, Cursor CLI (Composer-1) ranks #1 on Sigmabench, leading the field in overall score and showing standout performance across complex software engineering tasks.
42.9%
39.1%
50.9%
39.5%
What we measured:
Each score is assigned a tier based on how close they are with respect to the margin of error. The top-scoring group of agents are in Tier 1, the next-best group are in Tier 2, and so on.
Cursor (Composer-1) is the overall leader on the Sigmabench benchmark:
Codex is both the most accurate (+5.2 points) agent in this comparison. Consistency is statistically tied. Cursor, however, leads by a large margin of 27 percentage points in speed, representing about a 4x speed advantage.
42.9%
30.2%
39.1%
44.3%
50.9%
49.9%
39.5%
12.5%
Claude Code (Opus 4.5) and Cursor CLI are statistically tied on Consistency. Claude Code is leading by a full tier in Accuracy. Cursor leads once again by a large margin of 27 percentage points in speed, representing about a 4x speed advantage.
42.9%
32.0%
39.1%
43.1%
50.9%
49.8%
39.5%
15.3%
Speed advantage: Cursor CLI (Composer-1) is 4x faster on average than both Claude Code (Opus 4.5) and Codex CLI (GPT-5.1-Codex-Max).
Cursor CLI (Composer-1) trails the accuracy leader Codex CLI (GPT-5.1-Codex-Max) by 5 percentage points, and is tied on Consistency. The speed advantage is significant enough to maintain the #1 Sigmascore rank overall. points, but the speed advantage is significant enough to maintain the #1 Sigmascore rank overall.