Mizan arena · Featherweight preview

Same dataset. Same budget.
Best loss wins.

Mizan (Arabic for balance, scale) is the arena for small-model benches. Featherweight is the first bench inside it: one dataset, one step budget, one eval. No bigger-is-better, no moving the goalposts. The board names a community DoubleConv at val 1.3745, ahead of our own RWKV-mini at 1.4352.

Step into the ring Ask for release updates →

5divisions

500step budget

CPUonly

1.3745champion val loss

Previewonly

★ Where we are now

That board is last season. Since then we kept iterating — and a new in-house architecture now edges past RWKV head-to-head in seed-matched runs. A narrow, honest win, still firming up across seeds before it earns a place on the board. The benchmark stays the point.

Read the full update →

How it works

Four moves, one verdict.

Mizan behaves like a real arena, not a vague challenge page: same dataset, same budget, automated grading, and a leaderboard that changes only when the release is actually open.

1 · The bench

Same fight, every time.

Same dataset (TinyStories byte-LM), same step budget (500 default, configurable), same eval (held-out val loss). No moving the goalposts.

2 · Five divisions

Pick your weight class.

Precision, Optimizer, Attention, State-Space, Community. Each fighter is registered in exactly one division.

3 · The fight

Bot trains, bot grades.

The bot trains your config on the bench, evaluates on held-out val, and posts the result. No human in the loop.

4 · The verdict

Best loss takes the belt.

Best val loss per division wins the belt. Best across all divisions is champion of champions — until someone knocks them out.

The current champion

Featherweight already has a benchmark to beat.

The current internal Featherweight leader is a community-template DoubleConv configuration. The wider arena is not open publicly yet.

★ Champion of champions

A community-template DoubleConv at val 1.3745 after 800 steps. It beat our own RWKV-mini (val 1.4352 at 500 steps). The public submission flow is not open yet, but this is the bar the first Featherweight release will ship with. The benchmark stays the point.

The Featherweight divisions

Five divisions inside the first bench.

Every fighter is registered in one division. The current belt-holders are placeholder leaders until the arena opens.

Division	What we’re measuring	Current title-holder
Precision	FP32 / FP16 / Int8 / Ternary at same FLOPs budget	int8 STE — ties FP32 at 41% size
Optimizer	AdamW / Lion / Sophia / Muon at same step budget	Lion — beats AdamW by 6%
Attention	Softmax / Sliding window / GQA / Sparse	Sliding window — beats softmax at L=96
State-Space	RWKV / Hyena / SSM / Mamba	RWKV-mini — beats softmax 18–32%
Community	Anything else that fits the Featherweight bench	DoubleConv (1.3745)

The findings so far

What the bench has told us — so far.

Patterns we’ve seen across the first divisions to date. They’re meant to be challenged by better submissions once the arena opens.

State-space beats attention at this scale. RWKV, Hyena and SSM all beat softmax by 18–32% — even with 300 fewer training steps.

Lion beats AdamW by ~6% on the same bench, same step budget.

Sliding-window attention beats full softmax at sequence length 96 — and uses fewer parameters doing it.

GQA matches softmax at 5% fewer params. Lossless at this scale.

Int8 with straight-through estimator ties FP32 at 41% size. No accuracy delta. Just smaller.

Sophia is under-tuned in this bench. An honest failure — we post the loss curve and cost ledger. Better tunings welcome.

★ Open invitation

Build small models for a living? Think yours can beat 1.3745? The public submission flow is not open yet. If you want updates or early access, email us. Featherweight is first; the wider arena follows.

Ask for early access

Release status

The first Mizan materials are still being prepared.

The arena framing, the Featherweight ruleset, and the benchmark materials are still being organized. Until then, treat this page as a research preview of how the first bench will work rather than a public release.

# Mizan release status
Arena kit: in preparation
Submission flow: not open yet
Release updates: hello@tilelli.tech

House rules

Five rules. No referee.

Open-weights only. No API-gated models. If we can’t load it, it can’t fight.

Same dataset, same step budget. No swapping the corpus to find a friendlier one.

Reproducible on CPU. If it needs more than one GPU, it isn’t featherweight.

Negative results count. A clean failure with the cost ledger is a valid submission. We post it.

Release timing is still private. The rules are ready before the public materials are.