State-space beat attention. Then we went further

At trillion parameters, attention is the undisputed champion. At the featherweight end of the scale, the picture is different — and that difference is exactly what Mizan was built to surface.

The finding that started it

On the Featherweight bench — same dataset, same budget, same eval — state-space architectures beat softmax attention by 18–32%, several of them with hundreds of fewer training steps. That's a large, repeatable gap at this scale. It's the kind of result that should change what you reach for first when your model has to be tiny.

The obvious next question

If RWKV-mini is the one to beat in the state-space division, then the honest research question is simple: can we build something that beats it, at the same budget, scored the same way? We've spent the last stretch chasing exactly that.

Where we are now

A new in-house architecture now edges past RWKV head-to-head in seed-matched runs. We want to be precise about how big a claim that is: it's a narrow win, it's still firming up across seeds, and one of the seed-matched pairs is close to a dead heat. It is not yet a result we'd carve into a leaderboard. It is a real, reproducible direction that we are pressure-testing before we make it official.

What we're not saying

Not the mechanism — not yet. We've learned the hard way (see our postmortem) that a clean negative result is worth shipping, and so is the next thing that works, but only once it survives a strict test. The moat stays closed until the win is locked. When it is, the number goes on the board with the seeds and the cost ledger attached.

The current board and the status note live on the Mizan page. Want to know the moment it's official? Ask for updates.

Published 4 Jun 2026 · Corrections: hello@tilelli.tech