Notes from the lab

We publish both.

Working notes from a small lab — the architecture, the benchmarks, and the experiments that didn’t work. Some wins, some failures, every one with the receipts.

14 Jun 2026Yaz

A tiny model whose facts you can edit, that knows when it doesn’t know.

Each fact lives in its own atom — create, read, update, delete one at a time with provable locality, and it abstains when unsure which fact you mean. What works, what doesn’t, and why it’s a step not a breakthrough.

Read the write-up →
4 Jun 2026Mizan

State-space beat attention. Then we went further.

On the Featherweight bench, RWKV/Hyena/SSM beat softmax by 18–32%. So we asked what beats RWKV — and a narrow, honest win is now firming up across seeds.

Read the update →
3 Jun 2026Tilelli Med

Biomedicine on a diet: three values per weight.

Ternary knowledge-graph embeddings — 5.3× smaller than the FP32 baseline, 24 MB packed, and still ahead on OGBL-biokg and PrimeKG. A research tool, not a clinic.

Read the explainer →
2 Jun 2026Method

Every number on this page is bound to a script.

We don’t ask you to trust our benchmarks. The kit ships scripts that recompute each claim and exit non-zero when one doesn’t hold — and it has caught us.

Read the method →
1 Jun 2026Petitri

Ranking drug candidates with a confidence dial.

Petitri ranks candidate drugs per disease on the Tilelli Med stack — and keeps its confidence honestly low when the graph evidence is thin. For review, not treatment.

Read the preview →
31 May 2026NEO

A five-vendor jury, and nobody grades their own homework.

NEO grades seven leading chat models on honest uncertainty — every answer judged by a council of five vendors, each family barred from scoring its own model.

Read how it grades →
30 May 2026Mizan

Featherweight: five divisions, one belt.

Inside Mizan’s first bench — precision, optimizer, attention, state-space, community — and the rule that the smallest honest loss wins. Plus what the bench has shown.

Read the breakdown →
29 May 2026Atome

A language model that boots as firmware.

Atome runs a ternary LM on a $2 microcontroller — no OS, no internet, no app — with bit-exact Python↔C99 parity in a zero-heap engine. The honest, narrowed claim.

Read the build →
28 May 2026NEO

The hardest test for a chat model.

Most benchmarks reward confident answers. NEO rewards the right refusal — and penalizes the wrong one. Two ways to fail, only one of which the industry talks about.

Read the case →
27 May 2026The kit

Read the whole model in an afternoon.

A guided tour of the public kit — two 39 MB checkpoints, the 3-pathway architecture in a handful of files, and four scripts that check our claims. Apache-2.0, CPU-only.

Take the tour →
26 May 2026Ternary

Why ternary — and where it still loses.

Three-value weights are the whole point at $2-chip scale. On a 10M byte-LM they still trail FP32 by ~12%. Both facts are true; we ship both — and a 7-level middle road.

Read the trade-off →
24 May 2026Postmortem

Five attempts, $0.78 of GPU, hypothesis disproven.

The full v5 → v8b sweep — what we tried, where each attempt broke, and the clean mechanism we finally identified. The router is fragile, and we now know why.

Read the postmortem →
24 May 2026Negative result

When the small model learns to say “I don’t know.”

We ran five experiments to test whether router entropy correlates with semantic uncertainty. It doesn’t. Here’s what does work — and why we’re publishing the failure.

Read the write-up →
23 May 2026Architecture

Three pathways, one small brain.

Why Tilelli routes every token through a local conv, a sparse attention head, and a ternary dense FFN — and why this beats a vanilla transformer at the same parameter count.

Read the explainer →
20 May 2026Why small

Why a 10M-parameter model still matters.

In an era of trillion-parameter frontier models, what’s the point of a 39-megabyte one? Local inference, audit, reproducibility, and a model your laptop can carry.

Read the case →
18 May 2026Retraction

What actually beat vanilla.

The “6.7σ” headline is retracted. What we can defend: every Lite seed below the single vanilla seed, FP32 vs FP32, on TinyStories byte-LM. The receipts, and what’s still missing.

Read the retraction →
15 May 2026Recipe

Built from zero. Under twenty dollars.

The full reproducible recipe to train Tilelli v0.1 on one rented GPU for less than the price of a movie ticket. No pretrained weights, no shortcuts.

Read the recipe →