Blog
Engineering, methodology, and the work behind the numbers.
Long-form posts from the team that builds the platform. Each piece links to the source data, the repository, or the paper. We don't write thought-leadership.
Latest
All posts.
- Research22 min
Million-token context, no recall regression: methodology
The training recipe and ablations behind our 1M-token endpoint. Why we ditched ALiBi for YaRN, the attention sink trick, and the eval suite that catches needle-in-haystack failures.
Theo Lin · Apr 04, 2026Read - Pricing9 min
Why we publish hyperscaler list prices on our own pricing page
An argument for transparent comparison. The CSV behind every comparison ships in our pricing-data repo, and we refresh it on the first business day of each month.
Maya Chen · Mar 21, 2026Read - Engineering14 min
VPC interconnect into AWS, end to end
What actually happens when an iframe.ai cluster joins your AWS VPC. Direct Connect, route propagation, security groups, IAM, observability. With diagrams.
Dimitri Volkov · Mar 09, 2026Read - Pricing11 min
Spot capacity is a tax on your reliability budget
We don't offer spot. Here is the math we ran when we considered it, including the operational cost of mid-checkpoint failures on B200-class jobs.
Maya Chen · Feb 22, 2026Read - Customer12 min
How AnthroHealth moved 600 H200s out of AWS in three weeks
A customer story. BAA in five days, VPC interconnect on day eight, full traffic cutover on day twenty-one. Net cost reduction of 64% at the same SLA.
Will Park · Feb 08, 2026Read - Research16 min
FP8 inference without measurable MT-Bench regression
The calibration suite, the layers we kept in BF16, and the per-tensor scaling we use. Open source the eval harness so customers can verify quality on their own data.
Theo Lin · Jan 25, 2026Read - Pricing20 min
Build vs rent in 2026: a framework for the GPU question
When colocating your own DGX cluster makes sense, and when it doesn't. Includes a working spreadsheet, with our prices, hyperscaler prices, and a colocation TCO model.
Maya Chen · Jan 12, 2026Read - Engineering13 min
MoE routing instability and how we caught it in production
Mixtral 8×22B endpoints showed a 2× tail-latency spike under load. Root cause: expert routing collapsed onto two experts. The fix involved load-balanced routing and a watchdog.
Ana Roy · Dec 14, 2025Read
Most posts ship with a repo.
Reproduce a benchmark, fork a notebook, or open an issue against the methodology.