"Open source" often means a weights file and a wave goodbye. We wanted the opposite: a repository small enough that a curious person could read everything that determines the model's behavior in a single sitting.
What's in the box
The public kit at
github.com/TilelliLab/Tilelli-llm
ships two checkpoints — an FP32 chat model and a ternary pretrain base, each
about 39 MB — alongside chat.py, infer.py, the architecture in src/, a
working trainer in scripts/, and a demo data slice. Apache-2.0 throughout.
The architecture, legibly
The three-pathway routed block — local convolution, sparse top-k attention, ternary dense feed-forward — lives in a handful of source files, not a sprawling framework. If you've read the architecture explainer, the code will line up with it section for section. That's deliberate: the docs describe the code, and the code is short enough to confirm them.
The scripts that keep us honest
Under reproduce/ are the checks behind our claims, with their results in results/. They recompute
the documented numbers and exit non-zero if a claim drifts past ±5%. We wrote more about that discipline
in "Every number is bound to a script."
Three commands to talk to it
Install the CPU-only PyTorch wheel, pip install -e . the repo, and run python chat.py. No GPU,
no cloud, no API key — the whole thing is about 120 MB on disk. The full walkthrough is in
INSTALL.md.
Fork it, break it, retrain it for under twenty dollars. A model you can read is a model you can trust — or disprove.