On our first full real-world evaluation run, Cursor CLI (Composer-1) ranks #1 on Sigmabench, leading the field in overall score and showing standout performance across complex software engineering tasks.
42.9%
39.1%
51.2%
39.5%
What we measured:
Each score is assigned a tier based on how close they are with respect to the margin of error. The top-scoring group of agents are in Tier 1, the next-best group are in Tier 2, and so on.
Cursor (Composer-1) is the overall leader on the Sigmabench benchmark:
Codex is both the most accurate (+5.2 points) and most consistent (+7.1 points) agent in this comparison. Cursor, however, leads by a large margin of 27 percentage points in speed, representing about a 4x speed advantage.
42.9%
31.8%
39.1%
44.3%
51.2%
58.3%
39.5%
12.5%
Claude Code (Opus 4.5) and Cursor CLI are statistically tied on both Accuracy and Consistency. Cursor leads once again by a large margin of 27 percentage points in speed, representing about a 4x speed advantage.
42.9%
30.1%
39.1%
40.9%
51.2%
52.4%
39.5%
12.8%
Speed advantage: Cursor CLI (Composer-1) is 4x faster on average than both Claude Code (Opus 4.5) and Codex CLI (GPT-5.1-Codex-Max).
Cursor CLI (Composer-1) trails the accuracy and consistency leader Codex CLI (GPT-5.1-Codex-Max) by 5–7 percentage points, but the speed advantage is significant enough to maintain the #1 Sigmascore rank overall.