Benchmarks
Reproducible. Public. Quarterly.
Every benchmark result on this page links to the harness that produced it, the runtime version that ran it, and the seed that locked it. Run them yourself; we publish the recipe and the JSON.
| Workload | Config | iframe | Baseline | Speedup | Reproduce |
|---|---|---|---|---|---|
| Llama 3.1 70B inference | 1K-in / 1K-out · B200 · FP8 | 12,840 tok/s | 640 tok/svLLM 0.6 default | 20.0× | inference-bench · seed 0x11 iframe-runtime 2026.04 |
| Llama 3.1 70B long-context | 1M-in / 2K-out · B200 · learned-sparse | 3,840 tok/s prefill | 620 tok/s prefillFlashAttention-3 dense | 6.2× | longctx-evals · seed 0x17 iframe-runtime 2026.04 |
| Llama 3.1 70B training | B200 · 1024 GPUs · FP8 mixed | 51% MFU | 44% MFUMegatron-LM main | 1.16× | training-bench · seed 0x05 iframe-runtime 2026.04 |
| Mixtral 8×22B inference | 1K-in / 1K-out · H200 · FP8 | 9,240 tok/s | 1,180 tok/svLLM 0.6 default | 7.8× | inference-bench · seed 0x21 iframe-runtime 2026.04 |
| DeepSeek V3 inference | 4K-in / 4K-out · B200 · FP8 | 5,920 tok/s | 510 tok/sSGLang 0.4 default | 11.6× | inference-bench · seed 0x33 iframe-runtime 2026.04 |
| Llama 3.1 8B inference (MI300X) | 1K-in / 1K-out · MI300X · FP8 | 8,720 tok/s | 1,090 tok/svLLM ROCm | 8.0× | inference-bench · seed 0x41 iframe-runtime 2026.04 |
Methodology
What the numbers mean.
Quarterly cadence
We rerun every published benchmark on a clean cluster on the first business day of each quarter and publish the JSON output. Older runs remain available.
Baselines run with recommended configs
We do not benchmark against out-of-the-box defaults. Comparison runtimes use their published recommended configs for the workload.
Identical hardware
Both iframe and the baseline run on the same physical node. Same firmware, same drivers, same NCCL build. The only variable is the runtime.
Median of five
Each cell is the median of five runs. Min/max bands are visible in the JSON if you need them; we publish the raw output.
Seed-locked
Every published result includes the seed hard-coded in the harness config. Runs are bit-identical across machines if firmware and driver stack match.
Vendor-balanced
We publish wins and losses across NVIDIA, AMD, and Intel. If the baseline is faster on a workload, we publish that too — and start a project to fix it.
Run the benchmarks. Verify the numbers.
Free trial credits cover the published configs.