Benchmarks

Reproducible. Public. Quarterly.

Every benchmark result on this page links to the harness that produced it, the runtime version that ran it, and the seed that locked it. Run them yourself; we publish the recipe and the JSON.

Open the harnesses Inference pricing

Workload	Config	iframe	Baseline	Speedup	Reproduce
Llama 3.1 70B inference	1K-in / 1K-out · B200 · FP8	12,840 tok/s	640 tok/svLLM 0.6 default	20.0×	inference-bench · seed 0x11 iframe-runtime 2026.04
Llama 3.1 70B long-context	1M-in / 2K-out · B200 · learned-sparse	3,840 tok/s prefill	620 tok/s prefillFlashAttention-3 dense	6.2×	longctx-evals · seed 0x17 iframe-runtime 2026.04
Llama 3.1 70B training	B200 · 1024 GPUs · FP8 mixed	51% MFU	44% MFUMegatron-LM main	1.16×	training-bench · seed 0x05 iframe-runtime 2026.04
Mixtral 8×22B inference	1K-in / 1K-out · H200 · FP8	9,240 tok/s	1,180 tok/svLLM 0.6 default	7.8×	inference-bench · seed 0x21 iframe-runtime 2026.04
DeepSeek V3 inference	4K-in / 4K-out · B200 · FP8	5,920 tok/s	510 tok/sSGLang 0.4 default	11.6×	inference-bench · seed 0x33 iframe-runtime 2026.04
Llama 3.1 8B inference (MI300X)	1K-in / 1K-out · MI300X · FP8	8,720 tok/s	1,090 tok/svLLM ROCm	8.0×	inference-bench · seed 0x41 iframe-runtime 2026.04

Methodology

What the numbers mean.

Quarterly cadence

We rerun every published benchmark on a clean cluster on the first business day of each quarter and publish the JSON output. Older runs remain available.

Baselines run with recommended configs

We do not benchmark against out-of-the-box defaults. Comparison runtimes use their published recommended configs for the workload.

Identical hardware

Both iframe and the baseline run on the same physical node. Same firmware, same drivers, same NCCL build. The only variable is the runtime.

Median of five

Each cell is the median of five runs. Min/max bands are visible in the JSON if you need them; we publish the raw output.

Seed-locked

Every published result includes the seed hard-coded in the harness config. Runs are bit-identical across machines if firmware and driver stack match.

Vendor-balanced

We publish wins and losses across NVIDIA, AMD, and Intel. If the baseline is faster on a workload, we publish that too — and start a project to fix it.

Run the benchmarks. Verify the numbers.

Free trial credits cover the published configs.