Skip to content

Benchmarks

Reproducible. Public. Quarterly.

Every benchmark result on this page links to the harness that produced it, the runtime version that ran it, and the seed that locked it. Run them yourself; we publish the recipe and the JSON.

WorkloadConfigiframeBaselineSpeedupReproduce
Llama 3.1 70B inference1K-in / 1K-out · B200 · FP812,840 tok/s640 tok/svLLM 0.6 default20.0×inference-bench · seed 0x11

iframe-runtime 2026.04

Llama 3.1 70B long-context1M-in / 2K-out · B200 · learned-sparse3,840 tok/s prefill620 tok/s prefillFlashAttention-3 dense6.2×longctx-evals · seed 0x17

iframe-runtime 2026.04

Llama 3.1 70B trainingB200 · 1024 GPUs · FP8 mixed51% MFU44% MFUMegatron-LM main1.16×training-bench · seed 0x05

iframe-runtime 2026.04

Mixtral 8×22B inference1K-in / 1K-out · H200 · FP89,240 tok/s1,180 tok/svLLM 0.6 default7.8×inference-bench · seed 0x21

iframe-runtime 2026.04

DeepSeek V3 inference4K-in / 4K-out · B200 · FP85,920 tok/s510 tok/sSGLang 0.4 default11.6×inference-bench · seed 0x33

iframe-runtime 2026.04

Llama 3.1 8B inference (MI300X)1K-in / 1K-out · MI300X · FP88,720 tok/s1,090 tok/svLLM ROCm8.0×inference-bench · seed 0x41

iframe-runtime 2026.04

Methodology

What the numbers mean.

Quarterly cadence

We rerun every published benchmark on a clean cluster on the first business day of each quarter and publish the JSON output. Older runs remain available.

Baselines run with recommended configs

We do not benchmark against out-of-the-box defaults. Comparison runtimes use their published recommended configs for the workload.

Identical hardware

Both iframe and the baseline run on the same physical node. Same firmware, same drivers, same NCCL build. The only variable is the runtime.

Median of five

Each cell is the median of five runs. Min/max bands are visible in the JSON if you need them; we publish the raw output.

Seed-locked

Every published result includes the seed hard-coded in the harness config. Runs are bit-identical across machines if firmware and driver stack match.

Vendor-balanced

We publish wins and losses across NVIDIA, AMD, and Intel. If the baseline is faster on a workload, we publish that too — and start a project to fix it.

Run the benchmarks. Verify the numbers.

Free trial credits cover the published configs.