Methodology · Public
We publish what we benchmark, how we benchmark it, and the rules we grade ourselves against. The live numbers are at /accuracy.
Four metrics. Bracket detection is live and reported on /accuracy today. The other three are filed for Q2 2026 — listed here for full transparency, not yet on the headline page.
| Metric | Status | Ground truth | Scoring |
|---|---|---|---|
| Bracket detection | Live | WotC reference precons (every settled, post-cutoff precon = Bracket 2 / Core) | Exact-match. Predicted bracket must equal expected bracket. |
| Combo coverage | Live | Commander Spellbook combo database (community-maintained ground truth) | Round-trip every combo in our copy of the table through the detection engine. 100% means no combo data is lost in detection. Reported on /accuracy. |
| Mulligan keep accuracy | Live | Synthetic Fisher-Yates 7-card draws across every settled precon (50 hands each) | Aggregate keep rate. London mulligan + Reid Duke fundamentals predict 60-80% keep for ~35-37-land decks. Reported on /accuracy. |
| Manabase quality | Live | Frank Karsten 2022 + Marcus Schulze EDH adaptation (×1.65 scaling) | Score every settled precon, count precons ≥ 70/100. Documents the known gap that precons-as-shipped fall short of tournament-grade Karsten targets — a community baseline, not an engine bug. |
The bracket benchmark is a single script: scripts/benchmark-bracket-accuracy.ts. It loads every settled precon from the database, scores each with the same estimateBracketAsync function used by the live deck pages, and writes a JSON report to data/audits/bracket-accuracy-YYYY-MM-DD.json.
The SpellRack repo is currently private — once we open-source it (Q4 2026 plan), the script link will go live here. In the meantime we publish the latest report contents on /accuracy itself, and the JSON file is the same shape any third party would produce.
The headline number on /accuracy is cached for 24 hours and regenerated on the next page request after the cache expires. A weekly auto-rerun on a cron worker is filed for Q2 2026 — until then the benchmark also runs inline if the cache is missing, so the live page is never stale by more than a day.
Every published accuracy claim we're aware of in the Commander deckbuilding space:
| Tool | Published claim | Notes |
|---|---|---|
| ScrollVault | 88.9% accuracy | Karsten-anchored bracket scorer. Methodology published. |
| Archidekt | Not published | No public accuracy benchmark for their bracket detection. |
| Moxfield | Not published | No published bracket accuracy benchmark. |
| EDHRec deck stats | Not published | Provides aggregate stats; no bracket-detection benchmark. |
| TappedOut | Not published | No public bracket accuracy benchmark. |
| SpellRack (us) | See /accuracy for the live number | Live benchmark, reproducible, methodology published (this page). |
If you publish an accuracy benchmark for a competing tool and want to be listed here, email [email protected].