← All posts

Read the whole model in an afternoon.

A guided tour of the public kit — two checkpoints, the architecture in a handful of files, and the scripts that check our work.

27 May 2026 The kit ~4 min read

"Open source" often means a weights file and a wave goodbye. We wanted the opposite: a repository small enough that a curious person could read everything that determines the model's behavior in a single sitting.

What's in the box

The public kit at github.com/TilelliLab/Tilelli-llm ships two checkpoints — an FP32 chat model and a ternary pretrain base, each about 39 MB — alongside chat.py, infer.py, the architecture in src/, a working trainer in scripts/, and a demo data slice. Apache-2.0 throughout.

The architecture, legibly

The three-pathway routed block — local convolution, sparse top-k attention, ternary dense feed-forward — lives in a handful of source files, not a sprawling framework. If you've read the architecture explainer, the code will line up with it section for section. That's deliberate: the docs describe the code, and the code is short enough to confirm them.

The scripts that keep us honest

Under reproduce/ are the checks behind our claims, with their results in results/. They recompute the documented numbers and exit non-zero if a claim drifts past ±5%. We wrote more about that discipline in "Every number is bound to a script."

Three commands to talk to it

Install the CPU-only PyTorch wheel, pip install -e . the repo, and run python chat.py. No GPU, no cloud, no API key — the whole thing is about 120 MB on disk. The full walkthrough is in INSTALL.md.

Fork it, break it, retrain it for under twenty dollars. A model you can read is a model you can trust — or disprove.