The point of an open small model is that you can re-derive it yourself. This is the actual recipe. No pretrained weights, no shortcuts, no hidden steps.
1. Get the data
FineWeb-Edu, document-quality filtered. Roughly 2 GB of compressed text is enough for v0.1; we used 6 GB for
the final ckpt. The dataset is openly licensed; download with datasets from
Hugging Face.
2. Rent a GPU
A community-priced A6000 on RunPod runs about $0.40/hour. v0.1 takes ~5 hours of wall time to 50K steps. Budget:
~$2 of GPU. The rest of the ~$20 is generous buffer for restarts and tokenizer experiments.
3. Tokenize (or don't)
Tilelli is byte-level. Tokenization is the identity function: b"hello" → bytes 104, 101, 108, 108, 111. Vocabulary size 256. No tokenizer training run, no merge rules, no BPE artifacts.
This is one of the reasons the model stays small.
4. Train
python scripts/train.py --config configs/lite_10m.yaml. Architecture: 6 routed blocks,
each with (local-conv, sparse-attn-8h-k16, ternary-FFN expand=4) and a 3-pathway router. AdamW, lr 3e-4 cosine,
weight decay 0.01, 50K steps, seq=256, batch tokens ≈ 32K. The straight-through estimator handles the ternary
gradient.
5. SFT for chat (optional)
The "tilelli" chat persona is the result of a small abstain-aware SFT on top of the
pretrained ckpt. About 50K USER:/TILELLI: pairs, including deliberately unanswerable questions where the
target is "I don't know." Another hour or two of GPU.
6. Run it anywhere
The inference loop is in tilelli/inference.py. Pure PyTorch, runs on CPU, single-threaded
or multi-threaded. Loading the ckpt takes about a second.
## five lines, on CPU, no GPU required
from tilelli import TilelliLM, ByteTokenizer
tok = ByteTokenizer()
lm = TilelliLM.from_pretrained("tilelli/lite-10m")
out = lm.generate(tok.encode("Once upon a time"), max_new=128)
print(tok.decode(out))
What we spent (in total)
| Item | Cost |
|---|---|
| FineWeb-Edu data download (bandwidth) | $0 |
| Pretrain (50K steps, A6000) | ~$2 |
| Abstain-aware SFT | ~$0.50 |
| Metacognition experiments (v5–v8b) | $1.05 |
| Buffer (failed runs, debugging) | ~$3 |
| Total to v0.1 working ckpt | ~$7 |
Code, weights, and the abstain config are all in the public kit. If you want to talk to the maintainers about reproducing it, email hello@tilelli.tech.