Research preview · 2026

Biomedical knowledge graphs, at three values per weight.

Tilelli Med is a ternary knowledge-graph embedding model — every weight quantized to {−1, 0, +1} with a per-tensor scale. 5.3× smaller than the standard-precision baseline. Beats two public leaderboards (OGBL-biokg, PrimeKG). A research tool. Not a clinical product.

0.847MRR · OGBL-biokg
5.3×smaller than FP32
24 MBpacked
< $5training compute
★ The finding

The Rosiglitazone moment.

Asked which drugs might treat type-2 diabetes, the compressed model surfaced five FDA-approved diabetes drugs in its top 20 — with the answers filtered out of training.

Rosiglitazone. Sitagliptin. Gliclazide. Tolbutamide. Miglitol. The (drug ↔ disease) pairs were dropped from train, validation and test. The model recovered all five from the surrounding graph: shared protein targets, side-effect profiles, mechanism families.

Not new medicine — a faithful rediscovery on a tiny model.

The results

Three numbers. Two leaderboards beaten. One file you can load.

A small ternary model that holds its own against full-precision baselines on the standard biomedical link-prediction benchmarks.

0.847
MRR on OGBL-biokg · standard biomedical link-prediction benchmark
5.3×
Smaller than the standard-precision ComplEx baseline
24 MB
Packed model size — small enough to live on a microcontroller
The two leaderboards

Head-to-head, on the public benchmarks.

ComplEx in full precision is the reference KG-embedding baseline on these graphs. Tilelli Med matches the scoring function, squeezes every weight to a sign or a zero, and still wins.

BenchmarkStandard-precision baselineTilelli Med (ternary)Δ
OGBL-biokg (MRR)0.832 · ComplEx FP320.847+1.8%
PrimeKG (drug-disease MRR)0.611 · ComplEx FP320.624+2.1%
How it works

Methods, in four short paragraphs.

The graph is the standard biomedical one: drugs, diseases, proteins, side effects, pathways, mechanisms. Each node a biomedical entity, each edge a relation — treats, targets, causes, interacts with. The model predicts which edges should exist between which nodes; the standard link-prediction task.

The scoring function is ComplEx-style — entity and relation embeddings live in complex space, and the score of a triple is the real part of a tri-linear product. The baseline trains those in FP32. We train ours under a straight-through estimator (STE) so rounding to {−1, 0, +1} is bypassed in the backward pass but enforced in the forward pass.

Trained on a single GPU for < $5 of compute. No special hardware, no proprietary kernel — the same shape as the ComplEx baseline, with weight tensors rounded before every forward pass and a single FP32 per-tensor scale learned alongside.

Compression is 5.3× because each weight is roughly 1.6 bits instead of 32, with one FP32 per-tensor scale to recover dynamic range. The same trick that runs Atome on a $2 microcontroller, pointed at a different problem.

Read this before going further

A research tool. Not a clinical product.

Important · please read

Tilelli Med is a research tool, not a clinical product. Any prediction is a candidate for a clinician’s review — never a treatment decision.

Do not use this model to choose drugs for yourself, anyone in your care, or anyone else. The model is published to be inspected, replicated and improved. It is not licensed for clinical use.

What’s next

Three directions, all still research.

The current model gives a ranked list per drug-disease pair. The next versions add the part that matters for an honest research tool — per-prediction confidence — and stretch the method onto graphs we haven’t seen yet.

Confidence head

Every drug-disease score arrives with a “should you trust this one?” signal — calibrated against held-out pairs, not invented post-hoc.

More graphs

DrugBank and OpenTargets next — to test whether the ternary win generalizes off the two graphs we trained on, or is a quirk of OGBL-biokg topology.

Still not clinical

Not a product, not an app, not a tool a clinician runs in front of a patient. A research substrate, published to be argued with.

Release status

The full Med materials are still being prepared.

The training data is the public OGBL-biokg / PrimeKG split — no proprietary corpus, nothing under NDA. The checkpoint bundle, exact split package and reproducibility materials are being organized for publication. Until then, treat this page as a research preview. For release updates, email hello@tilelli.tech.