iri AI is an optimization layer for AI APIs. It sits between your application and providers like OpenAI, Anthropic, and Google, automatically reducing costs through intelligent caching, prompt compression, and model routing.

Smart Caching

Identical or semantically similar requests return cached responses instantly at zero cost.

Prompt Compression

Automatically reduce token count while preserving semantic meaning.

Intelligent Routing

Route simple queries to cost-effective models without sacrificing quality.

Typical Savings

Response Caching30-50%

Prompt Compression10-20%

Model Routing20-40%

Combined savings typically range from 20-40% depending on usage patterns.

Learn iri AI

Get started in 5 minutes

Topics

Getting Started

Teams & Roles

API Keys

Proxy & Performance

Optimization

Policies & Limits

Savings Dashboard

Enterprise Features

Hosted Models

Sustainability

CLI Tool

VS Code Extension

FAQ

Overview

What is iri AI?

Smart Caching

Prompt Compression

Intelligent Routing

Typical Savings