Why ternary, and where it still loses

Ternary quantization gets sold as a free lunch — same model, fraction of the size. It isn't free. It's a trade, and whether the trade is worth it depends entirely on where the model has to run.

The case for three values

Weights in {−1, 0, +1} turn the expensive part of inference — matrix multiplication — into adds and sign flips. No floating-point multiply on the hot path. On a $2 microcontroller with no FPU, that's not an optimization, it's the difference between running and not running. At that end of the scale, ternary is the point.

The honest cost

At 10 million parameters on a TinyStories byte-LM, the ternary path currently loses to FP32 by about 12%. That's the cost of representational coarseness at small parameter counts: with fewer weights, each one carries more, and rounding it to three values hurts more. The deployed Tilelli chat model ships in FP32 for exactly this reason — we put the better model in front of users and say so plainly.

Closing the gap

There's a middle road we're exploring: a power-of-three quantizer with seven levels ({0, ±1, ±3, ±9} × a scale) instead of plain ternary's three. It costs about 2.81 bits per weight versus ternary's 1.58, needs no new kernels — multiply-by-3 and -by-9 are shift-and-add — and on our most recent run it closes roughly 49% of the gap between plain ternary and FP32. Still behind FP32, but meaningfully less so.

The pattern underneath

Scale flips the verdict. At the very small end — tens of thousands of parameters, as in Atome — aggressive quantization can even come out ahead of a vanilla FP32 baseline. Push the same idea up to roughly a million parameters and the FP32 baseline pulls back in front. There's no universal "ternary wins" or "ternary loses." There's a curve, and the only honest thing is to report which point you measured.

The model that beat vanilla on TinyStories did it in FP32 — the full story is in "What actually beat vanilla". Ternary is shipped and trainable in the kit; toggle one flag and measure it yourself.

Published 26 May 2026 · Corrections: hello@tilelli.tech