Hosted Model Providers
Host open-source models like DeepSeek, Llama, and Qwen to reduce costs by 60-90%.
Why Host Your Own Models?
60-90%
Cost Reduction
Lower per-token costs vs commercial APIs
100%
Data Privacy
Data never leaves your infrastructure
Unlimited
No Rate Limits
Scale without API throttling
90%+ reduction
Green Options
Host in renewable energy regions
Supported Models
The following open-source models are supported for self-hosting:
DeepSeek V3
DeepSeek
Llama 3.3 70B
Meta Llama
Qwen 2.5 72B
Alibaba Qwen
Mistral Large 2
Mistral AI
Llama 3.2 8B
Meta Llama
Routing Strategies
Control how requests are routed between available providers:
Direct
Use exact model match. Falls back to external API if not hosted.
Best for: When you need a specific model
Least Cost
RecommendedRoute to the cheapest compatible model.
Best for: Maximum cost savings
Least Carbon
Route to the provider/region with lowest carbon emissions.
Best for: ESG compliance, sustainability goals
Round Robin
Rotate between available providers for load balancing.
Best for: High availability, load distribution
Availability
Route to the healthiest provider first.
Best for: Mission-critical applications
Cost Comparison
| Model | Commercial API | Self-Hosted | Savings |
|---|---|---|---|
| GPT-4 equivalent | $30.00/M | $0.20/M | 99% |
| Claude Sonnet equivalent | $15.00/M | $0.15/M | 99% |
| GPT-3.5 equivalent | $0.50/M | $0.03/M | 94% |
* Prices are per million tokens (input). Self-hosted costs include estimated infrastructure.
Provider Types
Self-Hosted
Run on your own infrastructure (AWS, GCP, Azure, on-prem)
Pros
- + Full control
- + Data privacy
- + Custom models
Cons
- - Requires infrastructure expertise
- - Maintenance overhead
IRI-Hosted
Managed hosting by IRI in optimized data centers
Pros
- + No infrastructure management
- + Optimized for performance
- + Green regions
Cons
- - Less control
- - Usage-based pricing
Commercial
External APIs (OpenAI, Anthropic, Google)
Pros
- + Latest models
- + No setup
- + High reliability
Cons
- - Higher costs
- - Rate limits
- - Data sent externally
Setting Up a Provider
- 1
Deploy your LLM server
Use vLLM, TGI, or llama.cpp to serve your model with an OpenAI-compatible API.
- 2
Add provider to database
Insert a record into model_providers with your server URL and capabilities.
- 3
Register models
Add entries to hosted_models with pricing and specifications.
- 4
Configure routing
Set your organization's routing strategy in Admin → Model Providers.
Quick Start with Seed Script
bun run scripts/seed-providers.tsThis creates sample providers and models for testing.
Model Providers Dashboard
Access provider management at Admin → Model Providers.
Provider Overview
View all providers with health status and model counts
Model Details
Expand providers to see available models, pricing, and capabilities
Routing Strategy
Select and configure your routing strategy
Cost Comparison
See estimated savings vs commercial APIs
API Reference
/api/admin/organizations/{orgId}/model-providersList providers, models, and current routing strategy
/api/admin/organizations/{orgId}/model-providersUpdate routing strategy or compare routing options