Inference pricing
Per-token pricing across the open-source catalog.
Input and output priced separately. Volume discounts kick in automatically. Methodology for the 20× throughput claim is on /research/inference-acceleration.
Pricing table
By model and tier.
| Model | Input $/M tokens | Output $/M tokens | Max context | Regions |
|---|---|---|---|---|
| Llama 3.1 8B | $0.05 | $0.10 | 1M | All 7 |
| Llama 3.1 70B | $0.32 | $0.55 | 1M | All 7 |
| Llama 3.1 405B | $1.85 | $3.40 | 1M | 4 |
| Qwen 2.5 7B | $0.05 | $0.10 | 1M | All 7 |
| Qwen 2.5 72B | $0.32 | $0.55 | 1M | All 7 |
| Mixtral 8×7B | $0.18 | $0.32 | 32K | All 7 |
| Mixtral 8×22B | $0.55 | $0.85 | 256K | All 7 |
| DeepSeek V3 | $0.45 | $0.85 | 1M | All 7 |
| DeepSeek R1 | $0.55 | $2.10 | 256K | 5 |
| Mistral Large | $0.65 | $1.40 | 256K | All 7 |
| Custom models | Quote | Quote | Per-model | Per-model |
Refreshed weekly. Volume discounts apply automatically once spend crosses each tier's monthly threshold.
Volume discounts
Automatic. No negotiation.#
Self-hosted vs serverless
Two ways to run.#
Most teams start serverless. High-throughput, latency-sensitive, or compliance-bound workloads run dedicated.
Serverless
Per-token, shared capacity.
The pricing on this page. No infra to manage. Routes to nearest healthy region by default.
Dedicated
Your model. Your GPUs.
Reserved capacity for your own model on your own GPUs without sharing. Custom SLAs and region pinning.
Reserved capacityAPI
OpenAI-compatible. One line to switch.#
from iframe import Inference
client = Inference(api_key="...")
response = client.chat.completions.create(
model="llama-3.1-70b",
messages=[{"role": "user", "content": "Hello, world."}],
)Point an existing OpenAI SDK at api.iframe.ai and pass an iFrame model identifier. The rest of your code is unchanged.
Streaming, tool use, and structured output all work; the API surface mirrors OpenAI's, with iFrame-specific extensions namespaced under iframe_*.
FAQ
Inference pricing questions.#
Drop in your OpenAI SDK. Save on every token.
Sign up free, swap the base URL, ship today.