About iri
AI costs money.
We help you spend it wisely.
Enterprise AI spending will hit $2T in 2026. Most companies have no idea where it goes. We built iri to fix that.
What we do
One product.
Three problems solved.
01
Visibility
See every API call, every user, every dollar—in real time. No more surprise bills.
02
Control
Set budgets that actually stop spending. Per user, per team, per model. Enforced before the API call.
03
Optimization
Route expensive requests to cheaper models when quality doesn't matter. Cache repeated queries. Save 30-50%.
How it works
Swap your API endpoint. Done.
# Before
OPENAI_BASE_URL=https://api.openai.com/v1
# After
OPENAI_BASE_URL=https://api.iri.ai/v1
Works with Cursor, Claude Code, Windsurf, and any OpenAI-compatible client.
50+
Models supported
<20ms
Latency overhead
15K+
Requests/sec
99.9%
Uptime SLA
Infrastructure
Built for speed.
Engineered for scale.
Our proxy handles over 15,000 requests per second with less than 20 milliseconds of overhead. That's less than 3% of a typical AI response time—essentially invisible.
Works seamlessly with Cursor, Claude Code, Windsurf, and any tool that speaks the OpenAI protocol. Use Claude models in Cursor. Use GPT in your Anthropic SDK. One endpoint, every provider.
Our principles
How we build
Prevention first
A limit that stops spending is better than a report about what went wrong. We enforce budgets before the API call, not after.
Day-one useful
If it takes a week to set up, it gets abandoned. You should see value in the first five minutes.
No lock-in
Standard OpenAI-compatible API. Remove iri anytime—your code doesn't change.
Honest pricing
See what it costs before you sign up. No "contact sales" games. Free tier included.
The name
iri — named after a cat who watches everything and doesn't tolerate waste. Lowercase, always.