Skip to content

Research · Lab

The lab that built the cloud.

Twenty-three full-time researchers, three teams, one building shared with the engineers who run production. The lab is not adjacent to the company — it is the source.

Composition

Three teams. Three director-level researchers.

Long-context

Soroush Bahmani, PhD

Attention sparsity, KV-cache compression, prefix caching, and the eval harness that makes 1M-token benchmarks reproducible. Eight researchers.

12 papers · 1 production system

Acceleration

Lila Saadat, PhD

Quantization, speculative decoding, kernel fusion, and the optimizer pipeline that compiles models down to FP8 and INT4 with quality regression bounds. Nine researchers.

8 papers · 20x speedup

Systems

Wei Chen, PhD

The runtime — vendor-neutral collectives, scheduler, fault tolerance, and the multi-vendor benchmarking harness. Six researchers, three SREs.

5 papers · 3 vendors supported

Principles

How we operate.

We ship what we publish

Every paper has a target product feature before it goes to peer review. Reviewers are good at finding what is novel; customers are good at finding what is true.

Compute is honest

Researchers in the lab use the same cluster that customers do. There is no internal-only tier. If a kernel breaks at 256 nodes for them, it breaks at 256 nodes for you.

Open by default

Code, weights, and benchmarks released with every paper. Internal-only work happens when there is a customer-data reason; otherwise it goes to GitHub on the day of publication.

Customers are co-authors

Three of the eight papers from the long-context team include enterprise customers as co-authors. The questions worth answering tend to come from the workloads worth running.

By the numbers

What the lab has shipped to date.

Papers since 2022
34
Open-source releases
7
Citations
2,148
In production
9
Lab outputs powering features in cloud or inference today.

Join us

The lab is hiring.

Research engineers, scientists, and PhD interns. We hire from where the questions are: long-context, kernels, distributed systems.