Small AI lab, big ideas · Marrakech

Most AI bluffs. This one stops.

Tilelli is a 10-million-parameter language model that runs on your CPU, says “I don’t know” instead of bluffing, and reproduces end-to-end on one rented GPU for under $20 — checkpoint, trainer and dataset in a single repo you can read in an afternoon. Every number on this page is bound to a script that exits non-zero if it can’t reproduce it.

2live products
2research previews
CPUinference
Apache 2.0open licence
16lab notes
In public

Products, previews, and the arena.

Tilelli and Atome are live. NEO and Tilelli Med are research previews. Mizan is the arena where small architectures fight fair.

01Live product 10M · 3 pathways · CPU

Tilelli

A 10-million-parameter byte-level transformer that routes every token through three lightweight pathways — local convolution, sparse top-k attention, and a ternary dense feed-forward. The chat model catches gibberish at AUROC 0.93, fires the abstain template on 9 of 10 held-out IDK probes, and refuses cleanly out of distribution.

0.5686Lite val · directional
9/10Held-out IDK gate
0.93AUROC gibberish
02Live product $2 chip · zero heap · bit-exact

Atome

A ternary language model that boots as firmware on a $2 microcontroller. No OS, no internet, no app. The honest narrowed claim after a prior-art audit: first ternary LM with bit-exact Python ↔ C99 parity in a zero-heap C99 engine on $2 MCUs.

~$2per chip, retail
20 KBcompiled binary
bit-exactPython ↔ C99
03Research preview 7 models · 5-vendor council

NEO

A benchmark for whether seven leading chat models — Claude, GPT, Gemini, DeepSeek, Qwen, Grok, Llama — say “I don’t know” when they don’t, or guess. Every answer graded by a five-vendor council, each family barred from grading its own model.

7models on the board
1,015questions
5vendor judges
04Research preview Ternary · 5.3× smaller · 2 boards

Tilelli Med

Knowledge-graph embeddings for biomedicine — drugs, diseases, proteins, side effects, pathways — at three values per weight. 5.3× smaller than the standard-precision baseline, and still ahead on two public leaderboards. A research tool. Not a clinical product.

0.847MRR · OGBL-biokg
5.3×smaller vs FP32
24 MBpacked
05Research prototype 807K params · CPU · editable facts

Yaz

A sub-1M-parameter model whose facts live in addressable atoms: create, read, update and delete them one at a time — with provable per-edit locality — and it abstains when it isn’t sure which fact you mean instead of guessing. A research prototype, honestly scoped. Not a product.

0side-effects / edit
C·R·U·Deditable facts
0.004abstention AURC
The headline result

Lite beat vanilla on the seed we kept the logs for.

A directional win, not a tournament. TinyStories byte-LM, 10M parameters, 50K steps, sequence length 256. Don’t take our word for it — the kit ships a script that reproduces the number ±5% or exits non-zero.

ModelSetupVal (bpc)
Vanilla baselinepre-norm transformer · 10.09M0.5707
Tilelli Lite · seed 12343-pathway · 10.18M0.5686
DeltaLite vs vanilla−0.37%
The honest caveat

Result depends on which eval method you use. Within-training periodic eval: Lite loses 0.6%. Post-training single-batch eval: Lite wins 0.4%. Two of three Lite seed logs were not preserved, and the earlier “6.7σ” headline is retracted. A 3-seed Welch test with matched eval_every is queued at ~$2.60 of A40 time. We’d rather you knew this than not.

Deployed reality

It knows when it doesn’t know.

Most language models hallucinate confidently. Tilelli watches its own output-confidence and abstains when the signal goes flat. No theatrics, no invented facts — just the model saying so.

The mechanism is the deployed model’s own max_softmax_mean over generated tokens. We tested it across seven OOD regimes; it’s reliable on the ones that matter most (gibberish, syntactic OOD) and honest about the ones where it isn’t (semantic OOD, factual-misleading).

0.93
AUROC · gibberish vs in-domain
9/10
Held-out IDK gate · script ≥ 9
7/20
NEO false-inability refusal rate
10M
Parameters · 39 MB packed
The architecture

Three pathways. Per-token routing. No quadratic-attention monolith.

PathwayFLOPsWhat it learnsNotes
Local conv~25%Bigrams, common phrases, anything within the kernelk=5 causal 1D convolution
Sparse attention~50%Long-range; pays for what it usesTop-k, 8 heads, k≤16, position embedding
Dense FFN~25%The model’s storage; where knowledge livesTernary {−1, 0, +1}, expand=4, per-tensor scale
Honest limits

What it can’t do.

Tilelli is 10 million parameters — about 39 megabytes packed. Small enough to fit on a phone, far too small to memorize the web. When it says “I don’t know,” that isn’t a bug we’re polishing; it’s the architecture admitting its limits.

Not a search engine

Won’t know yesterday’s news, niche trivia, or your favorite band’s drummer.

Not a coder

Can sketch a function, not ship a service. For real work, use a real model.

Not multilingual yet

v0.1 trains on FineWeb-Edu — English-leaning, document-quality filtered. Other languages come later.

What it IS

Small, fast, auditable, knows the shape of its own ignorance — and a clean substrate to study how that signal forms.

Install

Three commands. CPU only. ~120 MB.

If you have Python, you can chat with Tilelli in under three minutes. The kit ships the checkpoint, a TinyStories demo dataset, and a working trainer. No GPU, no cloud, no API key.

# 1. CPU-only torch (skip the 2GB CUDA wheel)
pip install --index-url https://download.pytorch.org/whl/cpu torch

# 2. Install Tilelli
git clone https://github.com/TilelliLab/Tilelli-llm
cd Tilelli-llm && pip install -e .

# 3. Talk to it
python chat.py "Hello, who are you?"
# → "i am small but try to be honest"
★ Mizan · the arena

Think your small architecture beats ours? Mizan is the arena for model benches; Featherweight is the first bench — same dataset, same budget, same eval. The board names a community DoubleConv at val 1.3745, ahead of our RWKV-mini at 1.4352. That board is last season: a new in-house architecture now edges past RWKV head-to-head in seed-matched runs.

Tilelli — ⵣ

A word older than the calendar we date this page by.

Tilelli is the Tamazight word for freedom. The Imazighen — “the free people” — are a transnational indigenous people of North Africa. By the Amazigh calendar, the year you’re reading this in is 2976. Naming a small, low-power model after that word isn’t accidental: Tilelli runs on a CPU, and the recipe is reproducible end-to-end for under twenty dollars. Freedom to study, to fork, to deploy without a vendor.

A tribute · I

To His Majesty King Mohammed VI.

For the stable, open, modernizing Morocco that makes a small independent AI lab possible — for the technological push, the global posture, the cultural openness that lets us publish on the open web with confidence. This work exists in the conditions you set. Thank you, Sire.

A tribute · II

To Marrakech.

The Tamazight name Mur N’AkushLand of God — gave the city its name. Founded around 1070, Marrakech has for nearly a thousand years been a meeting point of Amazigh, African, Arab, and Mediterranean influences. That mix still shapes how we think and build.

Open, honest, yours

A model you can read, run, and disprove.

Clone the kit and retrain it from zero for the price of a movie ticket.