← Documentation

Hosted Model Providers

Host open-source models like DeepSeek, Llama, and Qwen to reduce costs by 60-90%.

Why Host Your Own Models?

60-90%

Cost Reduction

Lower per-token costs vs commercial APIs

100%

Data Privacy

Data never leaves your infrastructure

Unlimited

No Rate Limits

Scale without API throttling

90%+ reduction

Green Options

Host in renewable energy regions

Supported Models

The following open-source models are supported for self-hosting:

DeepSeek V3

DeepSeek

Premium
Context: 128KReasoning, Code, Math

Llama 3.3 70B

Meta Llama

Premium
Context: 128KGeneral, Code, Instruction Following

Qwen 2.5 72B

Alibaba Qwen

Premium
Context: 128KMultilingual, Math, Code

Mistral Large 2

Mistral AI

Premium
Context: 128KMultilingual, Code, General

Llama 3.2 8B

Meta Llama

Basic
Context: 128KFast, Efficient, Simple Tasks

Routing Strategies

Control how requests are routed between available providers:

Direct

Use exact model match. Falls back to external API if not hosted.

Best for: When you need a specific model

Least Cost

Recommended

Route to the cheapest compatible model.

Best for: Maximum cost savings

Least Carbon

Route to the provider/region with lowest carbon emissions.

Best for: ESG compliance, sustainability goals

Round Robin

Rotate between available providers for load balancing.

Best for: High availability, load distribution

Availability

Route to the healthiest provider first.

Best for: Mission-critical applications

Cost Comparison

ModelCommercial APISelf-HostedSavings
GPT-4 equivalent$30.00/M$0.20/M99%
Claude Sonnet equivalent$15.00/M$0.15/M99%
GPT-3.5 equivalent$0.50/M$0.03/M94%

* Prices are per million tokens (input). Self-hosted costs include estimated infrastructure.

Provider Types

Self-Hosted

Run on your own infrastructure (AWS, GCP, Azure, on-prem)

Pros

  • + Full control
  • + Data privacy
  • + Custom models

Cons

  • - Requires infrastructure expertise
  • - Maintenance overhead

IRI-Hosted

Managed hosting by IRI in optimized data centers

Pros

  • + No infrastructure management
  • + Optimized for performance
  • + Green regions

Cons

  • - Less control
  • - Usage-based pricing

Commercial

External APIs (OpenAI, Anthropic, Google)

Pros

  • + Latest models
  • + No setup
  • + High reliability

Cons

  • - Higher costs
  • - Rate limits
  • - Data sent externally

Setting Up a Provider

  1. 1

    Deploy your LLM server

    Use vLLM, TGI, or llama.cpp to serve your model with an OpenAI-compatible API.

  2. 2

    Add provider to database

    Insert a record into model_providers with your server URL and capabilities.

  3. 3

    Register models

    Add entries to hosted_models with pricing and specifications.

  4. 4

    Configure routing

    Set your organization's routing strategy in Admin → Model Providers.

Quick Start with Seed Script

bun run scripts/seed-providers.ts

This creates sample providers and models for testing.

Model Providers Dashboard

Access provider management at Admin → Model Providers.

Provider Overview

View all providers with health status and model counts

Model Details

Expand providers to see available models, pricing, and capabilities

Routing Strategy

Select and configure your routing strategy

Cost Comparison

See estimated savings vs commercial APIs

API Reference

GET/api/admin/organizations/{orgId}/model-providers

List providers, models, and current routing strategy

POST/api/admin/organizations/{orgId}/model-providers

Update routing strategy or compare routing options

Related Topics