Research · Lab
The lab that built the cloud.
Twenty-three full-time researchers, three teams, one building shared with the engineers who run production. The lab is not adjacent to the company — it is the source.
Composition
Three teams. Three director-level researchers.
Long-context
Soroush Bahmani, PhD
Attention sparsity, KV-cache compression, prefix caching, and the eval harness that makes 1M-token benchmarks reproducible. Eight researchers.
12 papers · 1 production system
Acceleration
Lila Saadat, PhD
Quantization, speculative decoding, kernel fusion, and the optimizer pipeline that compiles models down to FP8 and INT4 with quality regression bounds. Nine researchers.
8 papers · 20x speedup
Systems
Wei Chen, PhD
The runtime — vendor-neutral collectives, scheduler, fault tolerance, and the multi-vendor benchmarking harness. Six researchers, three SREs.
5 papers · 3 vendors supported
Principles
How we operate.
We ship what we publish
Every paper has a target product feature before it goes to peer review. Reviewers are good at finding what is novel; customers are good at finding what is true.
Compute is honest
Researchers in the lab use the same cluster that customers do. There is no internal-only tier. If a kernel breaks at 256 nodes for them, it breaks at 256 nodes for you.
Open by default
Code, weights, and benchmarks released with every paper. Internal-only work happens when there is a customer-data reason; otherwise it goes to GitHub on the day of publication.
Customers are co-authors
Three of the eight papers from the long-context team include enterprise customers as co-authors. The questions worth answering tend to come from the workloads worth running.
By the numbers
What the lab has shipped to date.
Open work
Code we maintain.
FlashAttn-IF
Multi-vendor flash-attention kernels with the long-context sparsity patterns from our 2024 NeurIPS paper. Drop-in replacement for the upstream FlashAttention.
IF-NCCL
A vendor-neutral collective communication library with NCCL, RCCL, and oneCCL backends and a single source-compatible API.
Long-context eval harness
Reproducible benchmarks for million-token context windows: needle-in-a-haystack, multi-doc QA, code completion across whole repositories.
Join us
The lab is hiring.
Research engineers, scientists, and PhD interns. We hire from where the questions are: long-context, kernels, distributed systems.