For AI Labs & Scale-ups

Reserved + hourly burst

Capacity that holds, prices that hold up.

For AI-native companies and frontier model labs running serious training and inference. Reserved B200 / B300 capacity at multi-year commit pricing — with hourly burst on the same cluster when the curve spikes. Dedicated cluster engineers. Direct access to the research lab.

Talk to sales See pricing

Who this is for

The shape of the workload.

AI labs

Frontier model teams pre-training or fine-tuning at 256+ GPUs steady-state. We allocate dedicated NVL72 / DGX SuperPOD-class capacity with InfiniBand at 800 Gbps per node and an SLA written for training, not video calls.

AI-native scale-ups

Series B+ companies whose product is the model — inference at 10⁹+ tokens/day, training every quarter. Reserved seats, transparent overage at list, and headroom for the next training run without a renegotiation.

Neoclouds & platforms

Operators reselling capacity or building managed AI products on top of ours. Reserved-tier wholesale pricing, per-tenant network isolation, and a partner program that does not compete with you on the front end.

What you get

The capacity envelope.

Reserved is the default. Hourly is the safety valve. Both run on the same hardware, in the same regions, peered into the same VPC.

Reserved capacity, 1-3 years

Up to 60% off list at 36 months on B200 / B300. Capacity guaranteed in writing — exact rack, exact region, exact interconnect. No oversubscription, no "best effort" footnote.

Hourly burst on top

Reserved seats hit 100% utilization? You burst into pay-as-you-go on the same cluster at list price — already 3× cost efficient. No separate account, no separate region, no cold-start.

Multi-region scheduling

Workloads with data-residency flexibility get scheduled to the lowest-cost qualifying region by default. Same control plane, seven regions, one bill.

VPC interconnect (1y included)

AWS Direct Connect, Azure ExpressRoute, GCP Cloud Interconnect. Setup and steady-state for one peer included for the first twelve months. Most labs add a second by month nine.

Dedicated cluster engineer

A named systems engineer paired to your account at the 256-GPU tier and above. Reachable in your shared Slack within fifteen minutes. They know your topology, your kernels, your NCCL flags.

Direct lab access

The same researchers who write our papers will sit in your design reviews. Long-context, inference acceleration, mixed-precision — bring the workload, leave with a measurable improvement.

Commercial terms

The shape of the contract.

01
Scoping call
One thirty-minute call covering workload shape, region requirements, peak vs. steady-state, and committed quarterly draw-down. We come back with a single sized proposal.
02
Pilot allocation
Up to 64 GPUs for two weeks against the pilot SOW. Same hardware, same region, same SLA — only the term differs. You evaluate on production traffic, not a synthetic benchmark.
03
Reserved + burst contract
Standard MSA, DPA, BAA. 1, 2, or 3-year reserved tiers. Quarterly true-ups against committed draw-down — no penalty for over-consumption, full credit for under-consumption.
04
Ongoing partnership
QBRs with engineering, not just account management. Roadmap visibility two quarters out. First access to new SKUs at launch — B300 customers had hardware day-of GA.

Engineering relationship

Treated like infrastructure, not a SaaS account.

A real human, named.
At 256 GPUs and above, you get a dedicated cluster engineer — not a CSM. They're a systems engineer who has shipped distributed-training code. They have your runbooks, your topology, and your NCCL configuration in their head.
Office hours with the lab.
Bi-weekly thirty-minute window with a researcher on long-context, inference acceleration, or mixed-precision — whichever is most relevant to your workload. Bring kernels, leave with throughput.
Roadmap visibility, two quarters out.
You'll know what's landing in the inference engine before the changelog does. New SKUs, new regions, new compiler passes — flagged early so your capacity planning doesn't go stale.
Same engineers run our cluster.
There is no separate "customer support" team gated from engineering. Page-outs land with the people who shipped the runtime. Your incident is their incident.

FAQ

Common questions from labs and scale-ups.#

Reserved capacity