Hosted Model Providers

Host open-source models like DeepSeek, Llama, and Qwen to reduce costs by 60-90%.

Why Host Your Own Models?

60-90%

Cost Reduction

Lower per-token costs vs commercial APIs

100%

Data Privacy

Data never leaves your infrastructure

Unlimited

No Rate Limits

Scale without API throttling

90%+ reduction

Green Options

Host in renewable energy regions

Supported Models

The following open-source models are supported for self-hosting:

DeepSeek V3

DeepSeek

Premium

Context: 128K•Reasoning, Code, Math

Llama 3.3 70B

Meta Llama

Premium

Context: 128K•General, Code, Instruction Following

Qwen 2.5 72B

Alibaba Qwen

Premium

Context: 128K•Multilingual, Math, Code

Mistral Large 2

Mistral AI

Premium

Context: 128K•Multilingual, Code, General

Llama 3.2 8B

Meta Llama

Basic

Context: 128K•Fast, Efficient, Simple Tasks

Routing Strategies

Control how requests are routed between available providers:

Direct

Use exact model match. Falls back to external API if not hosted.

Best for: When you need a specific model

Least Cost

Recommended

Route to the cheapest compatible model.

Best for: Maximum cost savings

Least Carbon

Route to the provider/region with lowest carbon emissions.

Best for: ESG compliance, sustainability goals

Round Robin

Rotate between available providers for load balancing.

Best for: High availability, load distribution

Availability

Route to the healthiest provider first.

Best for: Mission-critical applications

Cost Comparison

Model	Commercial API	Self-Hosted	Savings
GPT-4 equivalent	$30.00/M	$0.20/M	99%
Claude Sonnet equivalent	$15.00/M	$0.15/M	99%
GPT-3.5 equivalent	$0.50/M	$0.03/M	94%

* Prices are per million tokens (input). Self-hosted costs include estimated infrastructure.

Provider Types

Self-Hosted

Run on your own infrastructure (AWS, GCP, Azure, on-prem)

Pros

+ Full control
+ Data privacy
+ Custom models

Cons

- Requires infrastructure expertise
- Maintenance overhead

IRI-Hosted

Managed hosting by IRI in optimized data centers

Pros

+ No infrastructure management
+ Optimized for performance
+ Green regions

Cons

- Less control
- Usage-based pricing

Commercial

External APIs (OpenAI, Anthropic, Google)

Pros

+ Latest models
+ No setup
+ High reliability

Cons

- Higher costs
- Rate limits
- Data sent externally

Setting Up a Provider

1
Deploy your LLM server
Use vLLM, TGI, or llama.cpp to serve your model with an OpenAI-compatible API.
2
Add provider to database
Insert a record into model_providers with your server URL and capabilities.
3
Register models
Add entries to hosted_models with pricing and specifications.
4
Configure routing
Set your organization's routing strategy in Admin → Model Providers.

Quick Start with Seed Script

bun run scripts/seed-providers.ts

This creates sample providers and models for testing.

Model Providers Dashboard

Access provider management at Admin → Model Providers.

Provider Overview

View all providers with health status and model counts

Model Details

Expand providers to see available models, pricing, and capabilities

Routing Strategy

Select and configure your routing strategy

Cost Comparison

See estimated savings vs commercial APIs

API Reference

GET/api/admin/organizations/{orgId}/model-providers

List providers, models, and current routing strategy

POST/api/admin/organizations/{orgId}/model-providers

Update routing strategy or compare routing options

Why Host Your Own Models?

Supported Models

Routing Strategies

Cost Comparison

Provider Types

Setting Up a Provider

Quick Start with Seed Script

Model Providers Dashboard

API Reference

Related Topics