Cipherra is a continuous evaluation platform for AI agents, designed for ML engineers doing RL post-training. It allows you to run agent eval suites at scale on every model checkpoint, providing prioritized diagnostic reports that go beyond a simple score. Key features include:
- Scalable Evaluation: Run large-scale eval suites on each model checkpoint.
- Prioritized Diagnostics: Receive actionable reports highlighting critical issues.
- Model Agnostic: Bring any model for evaluation.
- Built for RL Post-Training: Tailored for reinforcement learning workflows.
Use cases include monitoring model improvements, identifying regressions, and ensuring agent reliability before deployment.

