Table of Contents
What Is OpenRouter?
OpenRouter is a unified API gateway that provides developers access to 500+ large language models from 60+ providers through a single integration point. It is not an AI model itself — it's an infrastructure layer that sits between your application and the AI model ecosystem (OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, xAI, and dozens more).
Instead of maintaining separate API keys, billing accounts, and integration code for each provider, developers connect to one OpenRouter endpoint, authenticate with one API key, and gain access to the entire AI model marketplace. OpenRouter handles authentication, billing, routing, failover, and response normalization behind the scenes.
By the numbers (early 2026):
- 4.2 million+ users globally
- 250,000+ apps on the platform
- 100+ trillion tokens processed per year (10x year-over-year growth)
- $500M valuation after Series A (led by Menlo Ventures)
- Total funding: $40 million (including seed round led by a16z)
Tool Scores Overview
| Metric | ChatGPT | Claude | Google Gemini | Poe |
|---|---|---|---|---|
| Ease of Use | 9/10 | 9/10 | 9/10 | 9/10 |
| Output Quality | 8/10 | 9/10 | 8/10 | 8/10 |
| Value for Money | 8/10 | 8/10 | 9/10 | 7/10 |
| Customer Support | 6/10 | 6/10 | 7/10 | 5/10 |
| Versatility | 9/10 | 8/10 | 9/10 | 9/10 |
| Overall Average | 8/10 | 8/10 | 8.4/10 | 7.6/10 |
How It Works
The workflow is straightforward:
- Sign up at openrouter.ai and optionally add credits (prepaid, denominated in USD)
- Generate an API key — a single key that works across all models
- Send requests to
https://openrouter.ai/api/v1/chat/completions, specifying the model you want in the request body - OpenRouter routes the request to the best available provider based on availability, price, latency, and your preferences
If the primary provider is down or rate-limited, OpenRouter automatically retries with alternative providers hosting the same model. For example, if Anthropic's direct API is overloaded, it can route your Claude request to AWS Bedrock or Google Vertex AI instead. This failover typically completes in under 2 seconds.
The platform is OpenAI SDK compatible — it's a drop-in replacement. Most existing code works by changing only the base URL and API key:
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="your-openrouter-key"
)
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4",
messages=[{"role": "user", "content": "Hello!"}]
)Available Models
OpenRouter provides access to all major model families:
| Provider | Key Models | Highlights |
|---|---|---|
| OpenAI | GPT-4o, GPT-4o-mini, o1, GPT-5 series | GPT-5.4 with 1M context and computer use |
| Anthropic | Claude Sonnet 4, Opus 4, Sonnet 4.5, Sonnet 4.6 | Claude Sonnet 4.6 with 1M context |
| Gemini 2.0 Flash, 2.5 Pro, 3.1 Flash Lite, 3.1 Pro | 90% cached token discount, multimodal | |
| Meta | Llama 3.1, 3.3, 4 Scout, 4 Maverick | Several models available for free |
| Mistral | Mistral Large, Mixtral, Devstral | Devstral: 123B open-source agentic coder |
| DeepSeek | DeepSeek V3.2 (309B) | Free with 256K context, ~90% of GPT-5.4 performance |
| xAI | Grok Code Fast 1, Grok 4 Fast | Among top reasoning models |
Plus hundreds more from Cohere, AI21 Labs, Perplexity, Qwen, Nous Research, and specialized/fine-tuned models. The full catalog is browsable at openrouter.ai/models.
Pricing
OpenRouter uses a pay-per-token, no-subscription model. You add credits and pay only for what you use — no monthly fees, no minimum spend.
How it works:
- Each model has separate input (prompt) and output (completion) token prices, set by providers and passed through without markup
- Credits deducted per-token as you make API calls
- You only pay for successful requests — if failover tries multiple providers, you only pay for the one that succeeds
Platform fees:
- Credit card purchases: 5.5% fee (minimum $0.80)
- Crypto purchases: 5% fee, no minimum
- BYOK (Bring Your Own Key): 5% usage fee on underlying provider cost, with 1M free requests/month
Example pricing (March 2026):
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| GPT-5.4 | $2.50 | $15.00 |
| Claude Sonnet 4 | ~$2.00 | ~$2.00 |
| Claude Opus 4 | $15.00 | $75.00 |
| Gemini 3.1 Flash Lite | $0.25 | $1.50 |
| DeepSeek V3.2 | Free | Free |
| Llama 4 Scout | Free | Free |
Volume discounts: 5% at $500/month, 10% at $5,000/month. Enterprise plans with annual commits and invoicing are available.
Free models: Dozens of quality models at zero cost, including DeepSeek R1, Llama 3.3 70B, and Gemma 3. Free models have rate limits of 20 requests/minute and 200 requests/day.
Key Features
Intelligent Routing
:nitrovariant — Append to any model name to prioritize throughput (fastest response):floorvariant — Prioritize price (cheapest provider):onlinevariant — Runs a web search and attaches results to the promptopenrouter/auto— Meta-router that automatically selects the best model for your query
Automatic Failover
Pass an array of model IDs in priority order. If the first model's providers are down, OpenRouter automatically tries the next. If a single model is hosted across multiple providers (e.g., Claude on Anthropic, Bedrock, and Vertex), it fails over between providers automatically.
Spending Controls
Set per-key credit limits that reset daily, weekly, or monthly. Restrict individual API keys to specific models. This protects against runaway costs if a key is compromised.
Privacy
- Zero Data Retention (ZDR) — Restrict routing to endpoints with ZDR policies
- Prompt/content storage is off by default; opt-in logging available
- ZDR agreements negotiated with providers (including OpenAI) on behalf of users
Response Healing
Automatically fixes malformed JSON responses from LLMs before they reach your application — a feature added in late 2025 that prevents parsing errors in production.
Pros and Cons
Pros:
- Single integration for 500+ models — one API key, one endpoint, one billing dashboard
- OpenAI SDK compatible — drop-in replacement, change only the base URL
- Automatic failover with sub-2-second recovery
- Free models available for experimentation
- Spending controls with per-key budget caps
- Response Healing fixes malformed LLM output automatically
- No monthly fees — pure pay-per-use
- Battle-tested at 100T+ tokens/year
Cons:
- Added latency of 50-70ms in real-world benchmarks (15ms at edge in ideal conditions)
- 5.5% platform fee on credit purchases — adds up at high volume ($5K/year on $100K/month spend)
- Managed-only architecture — no self-hosted option, which can block GDPR/HIPAA-regulated teams
- Limited deep observability for enterprise needs (per-user tracking, latency distributions)
- New provider features may lag compared to direct API access
- Introduces a dependency layer between you and the model provider
OpenRouter vs Direct API Access
| Factor | OpenRouter | Direct API |
|---|---|---|
| Integration effort | One endpoint, one key, one SDK | Separate key/SDK/billing per provider |
| Model access | 500+ models instantly | Only that one provider's models |
| Failover | Automatic cross-provider | Manual implementation required |
| Latency | +50-70ms overhead | Lowest possible |
| Pricing | Provider rates + 5.5% credit fee | Provider rates only |
| Feature parity | May lag on newest features | Immediate access |
| Data privacy | Additional data hop | Direct connection |
| Billing | Unified dashboard | Separate per provider |
Bottom line: OpenRouter is ideal for developers and small-to-medium teams needing multi-model access, rapid prototyping, and resilient production systems. Direct API access is better for enterprises with high-volume single-provider workloads, strict regulatory requirements, or latency-sensitive applications where every millisecond matters.
Who Should Use OpenRouter?
- Individual developers and indie hackers — Experiment with different models without creating multiple accounts. Start with free models and scale up.
- Startups — Rapidly prototype AI-powered products, switch models by changing one parameter, pay via a single dashboard.
- Product teams building multi-model apps — If your app lets users choose their model, or you want to route requests dynamically, OpenRouter eliminates maintaining multiple API clients.
- Enterprise teams — Centralize billing across departments, enforce usage caps, manage API key permissions. Useful for accessing the same model across multiple cloud providers.
- Researchers — Compare model outputs across providers for benchmarks and experiments.
- Open-source projects — Give users flexible model options out of the box.
Who should NOT use OpenRouter: Teams with strict HIPAA/GDPR data residency requirements needing self-hosted infrastructure, organizations with very high-volume single-model workloads where direct provider relationships yield better pricing, and teams needing hard SLAs or custom fine-tuning pipelines.
Getting Started in 5 Minutes
- Sign up at openrouter.ai with Google, GitHub, or email
- Add credits (optional for free models) — navigate to Credits page, minimum ~$5
- Generate an API key — Settings > Keys, name your key, optionally set credit limits and model restrictions
- Make your first request:
curl https://openrouter.ai/api/v1/chat/completions \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/llama-3.1-8b-instruct:free",
"messages": [{"role": "user", "content": "Hello!"}]
}'
With fallbacks (automatic model switching if one is down):
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4",
extra_body={"models": [
"anthropic/claude-sonnet-4",
"openai/gpt-4o",
"google/gemini-2.0-flash"
]},
messages=[{"role": "user", "content": "Hello"}]
)
Security tip: Always store your API key in environment variables, never in source code or version control. Set per-key spending limits to protect against runaway costs.