Research · Lab

The lab that built the cloud.

Twenty-three full-time researchers, three teams, one building shared with the engineers who run production. The lab is not adjacent to the company — it is the source.

Read papers Open positions

Composition

Three teams. Three director-level researchers.

Long-context

Soroush Bahmani, PhD

Attention sparsity, KV-cache compression, prefix caching, and the eval harness that makes 1M-token benchmarks reproducible. Eight researchers.

12 papers · 1 production system

Acceleration

Lila Saadat, PhD

Quantization, speculative decoding, kernel fusion, and the optimizer pipeline that compiles models down to FP8 and INT4 with quality regression bounds. Nine researchers.

8 papers · 20x speedup

Systems

Wei Chen, PhD

The runtime — vendor-neutral collectives, scheduler, fault tolerance, and the multi-vendor benchmarking harness. Six researchers, three SREs.

5 papers · 3 vendors supported

Principles

How we operate.

We ship what we publish

Every paper has a target product feature before it goes to peer review. Reviewers are good at finding what is novel; customers are good at finding what is true.

Compute is honest

Researchers in the lab use the same cluster that customers do. There is no internal-only tier. If a kernel breaks at 256 nodes for them, it breaks at 256 nodes for you.

Open by default

Code, weights, and benchmarks released with every paper. Internal-only work happens when there is a customer-data reason; otherwise it goes to GitHub on the day of publication.

Customers are co-authors

Three of the eight papers from the long-context team include enterprise customers as co-authors. The questions worth answering tend to come from the workloads worth running.

By the numbers

What the lab has shipped to date.

Papers since 2022

Open-source releases

Citations

2,148

In production

Lab outputs powering features in cloud or inference today.

Open work

Code we maintain.

FlashAttn-IF

Multi-vendor flash-attention kernels with the long-context sparsity patterns from our 2024 NeurIPS paper. Drop-in replacement for the upstream FlashAttention.

github.com/iframe-ai/flashattn-if

Learn more

IF-NCCL

A vendor-neutral collective communication library with NCCL, RCCL, and oneCCL backends and a single source-compatible API.

github.com/iframe-ai/if-nccl

Learn more

Long-context eval harness

Reproducible benchmarks for million-token context windows: needle-in-a-haystack, multi-doc QA, code completion across whole repositories.

github.com/iframe-ai/longctx-evals

Learn more

Join us

The lab is hiring.

Research engineers, scientists, and PhD interns. We hire from where the questions are: long-context, kernels, distributed systems.

Open positions Email the lab