Small AI lab, big ideas · Marrakech

Most AI bluffs. This one stops.

Tilelli is a 10-million-parameter language model that runs on your CPU, says “I don’t know” instead of bluffing, and reproduces end-to-end on one rented GPU for under $20 — checkpoint, trainer and dataset in a single repo you can read in an afternoon. Every number on this page is bound to a script that exits non-zero if it can’t reproduce it.

Reproduce it yourself →

2live products

2research previews

CPUinference

Apache 2.0open licence

16lab notes

In public

Products, previews, and the arena.

Tilelli and Atome are live. NEO and Tilelli Med are research previews. Mizan is the arena where small architectures fight fair.

        
            Live product
            10M · 3 pathways · CPU
          
TilelliA 10-million-parameter byte-level transformer that routes every token through three lightweight pathways — local convolution, sparse top-k attention, and a ternary dense feed-forward. The chat model catches gibberish at AUROC 0.93, fires the abstain template on 9 of 10 held-out IDK probes, and refuses cleanly out of distribution.

            github ↗
          
0.5686Lite val · directional
9/10Held-out IDK gate
0.93AUROC gibberish


        
            Live product
            $2 chip · zero heap · bit-exact
          
AtomeA ternary language model that boots as firmware on a $2 microcontroller. No OS, no internet, no app. The honest narrowed claim after a prior-art audit: first ternary LM with bit-exact Python ↔ C99 parity in a zero-heap C99 engine on $2 MCUs.

            atomelm.com ↗
            github ↗
          
~$2per chip, retail
20 KBcompiled binary
bit-exactPython ↔ C99


        
            Research preview
            7 models · 5-vendor council
          
NEOA benchmark for whether seven leading chat models — Claude, GPT, Gemini, DeepSeek, Qwen, Grok, Llama — say “I don’t know” when they don’t, or guess. Every answer graded by a five-vendor council, each family barred from grading its own model.
Read NEO →
7models on the board
1,015questions
5vendor judges


        
            Research preview
            Ternary · 5.3× smaller · 2 boards
          
Tilelli MedKnowledge-graph embeddings for biomedicine — drugs, diseases, proteins, side effects, pathways — at three values per weight. 5.3× smaller than the standard-precision baseline, and still ahead on two public leaderboards. A research tool. Not a clinical product.
Read Tilelli Med →
0.847MRR · OGBL-biokg
5.3×smaller vs FP32
24 MBpacked


        
            Research prototype
            807K params · CPU · editable facts
          
YazA sub-1M-parameter model whose facts live in addressable atoms: create, read, update and delete them one at a time — with provable per-edit locality — and it abstains when it isn’t sure which fact you mean instead of guessing. A research prototype, honestly scoped. Not a product.
Read Yaz →
0side-effects / edit
C·R·U·Deditable facts
0.004abstention AURC

      

The headline result

Lite beat vanilla on the seed we kept the logs for.

A directional win, not a tournament. TinyStories byte-LM, 10M parameters, 50K steps, sequence length 256. Don’t take our word for it — the kit ships a script that reproduces the number ±5% or exits non-zero.

Model	Setup	Val (bpc)
Vanilla baseline	pre-norm transformer · 10.09M	0.5707
Tilelli Lite · seed 1234	3-pathway · 10.18M	0.5686
Delta	Lite vs vanilla	−0.37%

The honest caveat

Result depends on which eval method you use. Within-training periodic eval: Lite loses 0.6%. Post-training single-batch eval: Lite wins 0.4%. Two of three Lite seed logs were not preserved, and the earlier “6.7σ” headline is retracted. A 3-seed Welch test with matched eval_every is queued at ~$2.60 of A40 time. We’d rather you knew this than not.

Reproduce yourself Read the audit trail →

Deployed reality

It knows when it doesn’t know.

Most language models hallucinate confidently. Tilelli watches its own output-confidence and abstains when the signal goes flat. No theatrics, no invented facts — just the model saying so.

The mechanism is the deployed model’s own max_softmax_mean over generated tokens. We tested it across seven OOD regimes; it’s reliable on the ones that matter most (gibberish, syntactic OOD) and honest about the ones where it isn’t (semantic OOD, factual-misleading).

0.93

AUROC · gibberish vs in-domain

9/10

Held-out IDK gate · script ≥ 9

7/20

NEO false-inability refusal rate

10M

Parameters · 39 MB packed

The architecture

Three pathways. Per-token routing. No quadratic-attention monolith.

Pathway	FLOPs	What it learns	Notes
Local conv	~25%	Bigrams, common phrases, anything within the kernel	k=5 causal 1D convolution
Sparse attention	~50%	Long-range; pays for what it uses	Top-k, 8 heads, k≤16, position embedding
Dense FFN	~25%	The model’s storage; where knowledge lives	Ternary {−1, 0, +1}, expand=4, per-tensor scale

Honest limits

What it can’t do.

Tilelli is 10 million parameters — about 39 megabytes packed. Small enough to fit on a phone, far too small to memorize the web. When it says “I don’t know,” that isn’t a bug we’re polishing; it’s the architecture admitting its limits.

Not a search engine

Won’t know yesterday’s news, niche trivia, or your favorite band’s drummer.

Not a coder

Can sketch a function, not ship a service. For real work, use a real model.

Not multilingual yet

v0.1 trains on FineWeb-Edu — English-leaning, document-quality filtered. Other languages come later.

What it IS

Small, fast, auditable, knows the shape of its own ignorance — and a clean substrate to study how that signal forms.

Install

Three commands. CPU only. ~120 MB.

If you have Python, you can chat with Tilelli in under three minutes. The kit ships the checkpoint, a TinyStories demo dataset, and a working trainer. No GPU, no cloud, no API key.

Get the kit Full install guide →

# 1. CPU-only torch (skip the 2GB CUDA wheel)
pip install --index-url https://download.pytorch.org/whl/cpu torch

# 2. Install Tilelli
git clone https://github.com/TilelliLab/Tilelli-llm
cd Tilelli-llm && pip install -e .

# 3. Talk to it
python chat.py "Hello, who are you?"
# → "i am small but try to be honest"

★ Mizan · the arena

Think your small architecture beats ours? Mizan is the arena for model benches; Featherweight is the first bench — same dataset, same budget, same eval. The board names a community DoubleConv at val 1.3745, ahead of our RWKV-mini at 1.4352. That board is last season: a new in-house architecture now edges past RWKV head-to-head in seed-matched runs.

Read the Mizan preview → Ask for release updates →

Tilelli — ⵣ

A word older than the calendar we date this page by.

Tilelli is the Tamazight word for freedom. The Imazighen — “the free people” — are a transnational indigenous people of North Africa. By the Amazigh calendar, the year you’re reading this in is 2976. Naming a small, low-power model after that word isn’t accidental: Tilelli runs on a CPU, and the recipe is reproducible end-to-end for under twenty dollars. Freedom to study, to fork, to deploy without a vendor.

A tribute · I

To His Majesty King Mohammed VI.

For the stable, open, modernizing Morocco that makes a small independent AI lab possible — for the technological push, the global posture, the cultural openness that lets us publish on the open web with confidence. This work exists in the conditions you set. Thank you, Sire.

A tribute · II

To Marrakech.

The Tamazight name Mur N’Akush — Land of God — gave the city its name. Founded around 1070, Marrakech has for nearly a thousand years been a meeting point of Amazigh, African, Arab, and Mediterranean influences. That mix still shapes how we think and build.

Open, honest, yours

A model you can read, run, and disprove.

Clone the kit and retrain it from zero for the price of a movie ticket.

Get the kit