Model benchmarks showing task success rates and performance metrics for AI coding assistants on real end-to-end coding tasks.