LogoDomain Rank App
icon of cto.new Bench

cto.new Bench

Model benchmarks showing task success rates and performance metrics for AI coding assistants on real end-to-end coding tasks.

Introduction

cto.new Bench is a comprehensive benchmarking platform that measures the performance of AI coding models on real-world development tasks. It provides objective metrics to help developers and teams evaluate different AI coding assistants.

Key Features
  • Real Task Measurement: Benchmarks are based on actual coding tasks completed by cto.new users, measuring merged code as a percentage of completed tasks
  • Transparent Methodology: Uses a 72-hour rolling success rate with a 2-day lag to allow for proper task resolution
  • Statistical Significance: Only includes models that meet minimum usage thresholds for reliable data
  • Monthly Leaderboard: Displays the most recent measurements for models meeting benchmark criteria within the last calendar month
  • Comprehensive Toolset: Models are evaluated using a full suite of development tools including file operations, terminal access, and code search capabilities
Use Cases
  • Developers: Compare different AI coding assistants to choose the most effective tool for their workflow
  • Teams: Make data-driven decisions about which AI coding tools to adopt based on objective performance metrics
  • Tool Providers: Understand how their models perform against competitors in real-world scenarios
  • Researchers: Access reliable benchmarking data for AI coding model performance analysis
Target Users
  • Software developers and engineers
  • Development teams and engineering managers
  • AI tool developers and researchers
  • Technical decision makers evaluating coding assistants

Analytics