cto.new Bench is a comprehensive benchmarking platform that measures the performance of AI coding models on real-world development tasks. It provides objective metrics to help developers and teams evaluate different AI coding assistants.
Key Features
- Real Task Measurement: Benchmarks are based on actual coding tasks completed by cto.new users, measuring merged code as a percentage of completed tasks
- Transparent Methodology: Uses a 72-hour rolling success rate with a 2-day lag to allow for proper task resolution
- Statistical Significance: Only includes models that meet minimum usage thresholds for reliable data
- Monthly Leaderboard: Displays the most recent measurements for models meeting benchmark criteria within the last calendar month
- Comprehensive Toolset: Models are evaluated using a full suite of development tools including file operations, terminal access, and code search capabilities
Use Cases
- Developers: Compare different AI coding assistants to choose the most effective tool for their workflow
- Teams: Make data-driven decisions about which AI coding tools to adopt based on objective performance metrics
- Tool Providers: Understand how their models perform against competitors in real-world scenarios
- Researchers: Access reliable benchmarking data for AI coding model performance analysis
Target Users
- Software developers and engineers
- Development teams and engineering managers
- AI tool developers and researchers
- Technical decision makers evaluating coding assistants

