v0.1Iris — The agent eval standard for MCP. 12 eval rules, open source

Eval-Driven Development

Write the rules before the prompt. TDD for AI agents.

Definition#

Definition

Eval-Driven Development (EDD)is the practice of defining evaluation rules before writing agent prompts — the same way test-driven development defines tests before writing code. You specify what "correct" looks like first, then build the agent to pass those rules. Every prompt iteration is measurable.

The EDD Cycle#

1

Define Rules

What does "correct" look like? Set thresholds for quality, safety, cost.

2

Write Prompt

Build the agent prompt to meet the rules you defined.

3

Score Outputs

Run the agent. Eval rules score every output automatically.

4

Iterate

Refine prompts based on scores. Repeat until rules pass consistently.

EDD vs TDD#

DimensionTDDEDD
Assertion typeExact matchScore threshold
Output modelDeterministicNon-deterministic
Runs in prod?No (CI only)Yes (every output)
What you define firstTest casesEval rules

How Iris Helps#

Iris makes EDD practical. Define your eval rules, add Iris to your MCP config, and every agent output is scored against those rules automatically. The same rules that guide development continue running in production — no separate test harness needed.

Read the deep dive: Eval-Driven Development →

Frequently Asked Questions#