Research

The cloud is downstream of the lab.

iframe.ai is a research lab that operates a cloud. Every paper we publish ends up in the runtime our customers use, and every customer workload we run feeds the questions we publish on. The two are not separable.

Read recent papers Inside the lab

By the numbers

What the lab has shipped.

Peer-reviewed papers

Since 2022, across NeurIPS, ICML, MLSys, OSDI, and SOSP.

Open-source releases

Including FlashAttn-IF, IF-NCCL, and the long-context eval harness.

University partners

Stanford, MIT, Berkeley, ETH, EPFL, Tsinghua, and others.

Papers in production

Lab outputs that ship inside the cloud and inference runtime.

Research areas

Three lines of work.

Every line of research has a clear path to a product feature. We don't write papers we can't ship.

Long-context inference

Sparse attention, KV-cache compression, and prefix caching schemes that make million-token contexts cost-competitive with 8K contexts on equivalent hardware.

12 papers · 1 production system

Learn more

Inference acceleration

Speculative decoding, FP8 and INT4 quantization, kernel fusion. Our managed inference endpoints run 10-20x faster than vLLM defaults on the same hardware.

8 papers · 20× speedup

Learn more

Vendor-neutral runtime

Collective communication, scheduler, and tensor-parallel patterns that work across NVIDIA, AMD, and Intel. Single source of truth for distributed training.

5 papers · 3 vendors

Learn more

Selected papers

Recent peer-reviewed work.

All papers

The lab

People, partners, and access.

Inside the lab

Twenty-three full-time researchers split across three teams: long-context, acceleration, and systems. Working in the same building as the engineers who run production.

Lab

Learn more