Documentation
Learn iri AI
Everything you need to reduce your AI API costs by up to 40% with intelligent caching, compression, and smart routing.
Quick Start
Get started in 5 minutes
Create your account, configure your organization, and make your first optimized API call.
Topics
Getting Started
Account setup, organization creation, and initial configuration
Teams & Roles
User management, permissions, and access control
API Keys
Configure provider keys for OpenAI, Anthropic, and Google
Proxy & Performance
15K+ req/sec, <20ms latency. Setup for Cursor, Claude Code, and more
Optimization
Caching, compression, and smart routing configuration
Policies & Limits
Budget controls, rate limits, and model restrictions
Savings Dashboard
Analytics, cost tracking, and usage insights
Enterprise Features
Granular user policies, bypass flags, and team controls
Hosted Models
Self-host DeepSeek, Llama, Qwen for 60-90% cost savings
Sustainability
Carbon tracking, ESG reporting, and green routing
CLI Tool
Command-line interface for terminal workflows
VS Code Extension
IDE integration for VS Code and Cursor
FAQ
Common questions and troubleshooting
Overview
What is iri AI?
iri AI is an optimization layer for AI APIs. It sits between your application and providers like OpenAI, Anthropic, and Google, automatically reducing costs through intelligent caching, prompt compression, and model routing.
Smart Caching
Identical or semantically similar requests return cached responses instantly at zero cost.
Prompt Compression
Automatically reduce token count while preserving semantic meaning.
Intelligent Routing
Route simple queries to cost-effective models without sacrificing quality.
Typical Savings
Combined savings typically range from 20-40% depending on usage patterns.