v0.1Iris — The agent eval standard for MCP. 12 eval rules, open source

Comparison

Iris vs Confident AI

Self-hosted inline eval vs managed cloud testing platform. Own your data or outsource the infrastructure.

TL;DR

Iris is a self-hosted MCP server that evaluates every agent output inline in production — your data stays on your machine, no API keys needed. Confident AI is the commercial cloud platform built on top of DeepEval, offering LLM-as-Judge evaluation, team dashboards, regression testing, and dataset management. If you want zero-vendor-dependency eval that runs on every output, Iris. If you need a managed platform with team collaboration and semantic evaluation, Confident AI.

For background on agent evaluation methodology, see our agent eval guide.

Feature Comparison

Side by side.

FeatureIrisConfident AI
DeploymentSelf-hosted (your infrastructure)Managed cloud (SaaS)
Integration methodMCP config (zero code)Python SDK + API key
Eval approachDeterministic heuristic rules (<1ms)LLM-as-Judge metrics (semantic)
When eval runsInline, every output in productionBatch experiments and CI/CD
Data ownership100% local (SQLite, your machine)Cloud-hosted (vendor manages)
Team collaborationSingle-user (team features on roadmap)Multi-user dashboards, shared experiments
Regression testingNot includedBuilt-in experiment comparison
Dataset managementNot includedSynthetic data generation, golden datasets
Cost trackingPer-trace USD cost, aggregate visibilityNot a primary feature
MCP supportProtocol-native (IS an MCP server)Not MCP-aware
PricingFree and open-source (MIT)Free tier + paid plans
PII detectionBuilt-in (SSN, credit card, phone, email)Via custom metrics or toxicity checks

Decision Guide

Which one fits your team?

When to choose Iris

  • You want to own your eval data — no cloud dependency, no vendor lock-in
  • You're building with MCP-compatible agents and want zero-code setup
  • You want inline production eval, not just pre-deployment testing
  • You need cost tracking and PII detection out of the box
  • You want fully open-source with no usage limits

When to choose Confident AI

  • You need team collaboration with shared dashboards and experiments
  • You want LLM-as-Judge evaluation for nuanced semantic scoring
  • You need regression testing to compare model versions
  • You want synthetic dataset generation for comprehensive test coverage
  • You prefer a managed platform without infrastructure overhead

Last verified: March 2026. This comparison is based on publicly available documentation and may not reflect recent changes to Confident AI. We aim to keep this page accurate and fair.

See something outdated or incorrect? Report an inaccuracy — we review and update within 48 hours.

Ready to see what your agents are doing?

Add Iris to your MCP config. First trace in 60 seconds. No SDK, no signup, no infrastructure.