v0.1Iris MCP Server — 3 tools, 12 eval rules, open source

Comparison

Iris vs Braintrust

MCP-native, zero-code observability vs SDK-powered eval and experimentation platform. Two different philosophies for AI quality.

TL;DR

Iris is an MCP server your agent discovers and uses automatically — zero code changes, zero SDK imports, one SQLite file for storage. Eval runs locally with sub-millisecond heuristic rules. Braintrust is a comprehensive eval and observability platform with powerful dataset management, experiment tracking, a prompt playground, and deep tracing. If you're building with MCP-compatible agents and want the simplest possible setup with local eval, Iris gets you there in 60 seconds. If you need production-grade experimentation workflows, human review, or CI-integrated regression testing, Braintrust is the deeper eval platform.

Feature Comparison

Side by side.

FeatureIrisBraintrust
Integration methodMCP config (zero code)SDK imports (Python, TS, Go, Ruby, C#)
Self-hostingSingle SQLite fileEnterprise plan only (cloud-first)
Performance overheadZero (no SDK in hot path)Async logging, minimal overhead
Eval approach12 built-in + 8 custom heuristic rules (<1ms)LLM, code, and human scoring + datasets + experiments
Prompt playgroundNot includedFull playground with side-by-side comparison
Datasets & experimentsNot includedProduction traces to datasets, experiment tracking, CI integration
Cost trackingPer-trace USD costPer-trace cost, per-user/feature/model breakdowns
MCP supportProtocol-native (IS an MCP server)MCP server for querying Braintrust data
LicenseMIT (fully permissive)Proprietary (proxy is MIT)
PricingFree & open-sourceFree tier (1M spans) / Pro $249/mo / Enterprise custom
Tracing depthMCP tool calls and agent tracesFull trace trees with token-level detail, visual timeline
Enterprise featuresRoadmap (v0.5)SOC 2, SSO, hybrid deployment, dedicated support

Decision Guide

Which one fits your stack?

When to choose Iris

  • You're building with MCP-compatible agents (Claude Desktop, Cursor, Windsurf)
  • You want zero-code integration — no SDK imports, no wrapper functions
  • You want simple self-hosting — one binary, one SQLite file, no cloud dependency
  • You want fully permissive MIT licensing with no proprietary modules
  • You want sub-millisecond heuristic eval that runs locally without LLM calls
  • You want to avoid per-seat or usage-based pricing

When to choose Braintrust

  • You need deep eval capabilities — datasets, experiments, human review, LLM scoring
  • You need a prompt playground for iterating on prompts with real data
  • You need enterprise compliance today (SOC 2, SSO, hybrid deployment)
  • You need multi-language SDK support (Python, TypeScript, Go, Ruby, C#)
  • You need granular cost analytics sliced by user, feature, or model
  • You need CI/CD-integrated regression testing against production datasets

Last verified: March 2026. This comparison is based on publicly available documentation and may not reflect recent changes to Braintrust. We aim to keep this page accurate and fair.

See something outdated or incorrect? Report an inaccuracy — we review and update within 48 hours.

Ready to see what your agents are doing?

Add Iris to your MCP config. First trace in 60 seconds. No SDK, no signup, no infrastructure.