← All posts

Featherweight: five divisions, one belt.

One dataset, one step budget, one eval. The smallest honest loss takes the belt.

30 May 2026 Mizan ~5 min read

Most architecture comparisons are unfair by accident — different data, different budgets, different evals — so "X beats Y" means almost nothing. Featherweight exists to remove the excuses.

The bench

Featherweight is the first bench inside Mizan. Same dataset (TinyStories byte-LM), same step budget (500 by default), same eval (held-out val loss). No bigger-is-better, no swapping the corpus to find a friendlier one. If a model wins here, it wins for a reason you can point at.

Five divisions

Every fighter is registered in exactly one division: Precision (FP32 / FP16 / Int8 / Ternary), Optimizer (AdamW / Lion / Sophia / Muon), Attention (softmax / sliding-window / GQA / sparse), State-Space (RWKV / Hyena / SSM / Mamba), and Community (anything else that fits the bench). Best loss per division wins the division; best across all of them is champion of champions.

What the bench has shown

A few patterns held up across the early divisions: state-space architectures beat softmax attention by 18–32% at this scale; Lion beats AdamW by about 6% on the same budget; sliding-window attention beats full softmax at sequence length 96; and Int8 with a straight-through estimator ties FP32 at 41% of the size. Sophia came in under-tuned — an honest failure we post with its loss curve, not a result we hide.

The champion on the board

The current leader is a community-template DoubleConv at val 1.3745, ahead of our own RWKV-mini at 1.4352. That's the bar the first public release will ship with — and, as the arena page now notes, the bar we've quietly started clearing with something new.

The arena isn't open to public submissions yet. If you build small models for a living and want in early, ask for updates.