v0.1Iris — The agent eval standard for MCP. 12 eval rules, open source

The Eval Tax

The hidden, compounding cost every agent team pays when they don't evaluate output quality.

Definition#

Definition

The eval tax is the compounding cost of every unscored agent output — measured in trust erosion, engineering hours spent on manual review, and liability exposure. Every agent execution without evaluation adds to the balance. The longer you wait to start evaluating, the higher the accumulated debt.

How It Compounds#

The eval tax isn't a one-time cost — it compounds. Each unscored output that reaches a user without evaluation creates downstream effects: support tickets from hallucinated answers, engineering time spent debugging agent behavior manually, compliance risk from undetected PII in outputs. These costs grow exponentially as agent usage scales.

The insidious part: most teams don't realize they're paying it. The costs are distributed across support, engineering, and legal — never attributed back to the missing eval layer. Teams describe symptoms ("we spend too much time reviewing agent outputs") without connecting them to the root cause.

Three Dimensions of the Eval Tax#

Direct Costs

Manual review hours, incident response, output correction, customer support from bad agent answers.

Indirect Costs

Delayed shipping (fear of agent failures), lost developer confidence, slower iteration cycles.

Risk Costs

Undetected PII exposure, hallucinated answers in production, compliance violations, reputational damage.

How Iris Helps#

Iris eliminates the eval tax by scoring every agent output automatically — inline, at the protocol layer. No SDK, no pipeline to build, no manual review. Add one line to your MCP config and every output gets evaluated for quality, safety, and cost.

npx @iris-eval/mcp-server

# Every agent output is now scored for:
# - Completeness (response length, structure)
# - Relevance (topic consistency)
# - Safety (PII detection, prompt injection)
# - Cost (token usage, USD per trace)

Read the deep dive: The AI Eval Tax →

Frequently Asked Questions#