v0.1Iris — The agent eval standard for MCP. 12 eval rules, open source

Comparison

Iris vs Patronus AI

Open-source heuristic eval vs enterprise safety platform. Different tools for different stages and requirements.

TL;DR

Iris is a self-hosted MCP server that scores every agent output with deterministic rules — PII detection, cost tracking, completeness, and relevance. Free and open-source. Patronus AI is an enterprise safety platform with fine-tuned evaluation models for hallucination detection, toxicity scoring, and custom safety criteria. If you want broad production coverage with zero overhead and zero cost, Iris. If you need deep semantic safety analysis for regulated or high-stakes deployments, Patronus AI.

For background on agent evaluation methodology, see our agent eval guide.

Feature Comparison

Side by side.

FeatureIrisPatronus AI
Eval approachDeterministic heuristic rules (<1ms)Fine-tuned eval models (Lynx, Glider)
Integration methodMCP config (zero code)REST API + SDK
DeploymentSelf-hosted (your infrastructure)Cloud API (vendor-hosted)
Hallucination detectionHeuristic pattern matchingPurpose-built Lynx model
Safety scoringPII detection, prompt injection, blocklistToxicity, bias, safety classifiers
Cost trackingPer-trace USD cost, aggregate visibilityNot a primary feature
When eval runsInline, every output in productionAPI call per evaluation request
MCP supportProtocol-native (IS an MCP server)Not MCP-aware
PricingFree and open-source (MIT)Enterprise pricing (contact sales)
Custom eval criteriaZod schema custom rulesCustom fine-tuned models
ComplianceSelf-hosted (you control data)SOC 2, enterprise security
Target audienceIndividual developers, small teamsEnterprise AI teams, regulated industries

Decision Guide

Which one fits your requirements?

When to choose Iris

  • You want free, open-source eval with no vendor dependency
  • You're building with MCP-compatible agents and want zero-code setup
  • You need cost tracking and spend visibility across agents
  • You want inline production eval on every output, not API calls per evaluation
  • You want self-hosted data ownership with no cloud dependency

When to choose Patronus AI

  • You need deep semantic hallucination detection with purpose-built models
  • You're in a regulated industry requiring enterprise-grade safety scoring
  • You need custom fine-tuned evaluation models for your specific domain
  • You want managed infrastructure with SOC 2 compliance
  • You need advanced toxicity and bias detection beyond heuristic rules

Last verified: March 2026. This comparison is based on publicly available documentation and may not reflect recent changes to Patronus AI. We aim to keep this page accurate and fair.

See something outdated or incorrect? Report an inaccuracy — we review and update within 48 hours.

Ready to see what your agents are doing?

Add Iris to your MCP config. First trace in 60 seconds. No SDK, no signup, no infrastructure.