# The Proof Table

Five real inputs, run through the models you are choosing between and judged against a bar you write before any model runs. Copy this page, fill it once, and rerun it after every prompt, schema, or model change. It is the cheapest regression net a first build can own, and the habit that becomes evals in The Practice.

---

## How to pick the five

Use one input per kind of reality your feature will meet, and pull each one from a real user or a real record, never from your imagination.

- **The typical case.** The input you expect most days; a model that fails here fails everywhere.
- **The crowded case.** An input carrying far more than average: the long list, the packed page, the message with six asks in it.
- **The degraded case.** An input that arrives damaged: a blurry photo, heavy typos, formatting that broke in a copy-paste.
- **The near-empty case.** An input with almost nothing in it, where the honest output is a short answer or a question back.
- **The wrong-kind-of-input case.** Something the feature was never meant to handle, where a passing output declines it rather than inventing an answer.

Before the first run, write one sentence per row stating what a passing output must do. If you write the bar after seeing the outputs, you will bend it to fit them.

## The table

| Input (and its source) | Model A output ok? | Model B output ok? | Notes / the failure in one line |
| --- | --- | --- | --- |
| Typical: | | | |
| Crowded: | | | |
| Degraded: | | | |
| Near-empty: | | | |
| Wrong kind of input: | | | |

Run every input through both models, mark each cell yes or no against your written bar, and when a row fails, write the failure in one line a colleague could act on.

## The verdict

**The model that ships:** _____

**The one failure that matters most:** _____

**The fix that is not a bigger model** (a prompt rule, a value check, a fallback): _____

## The rerun log

Rerun all five rows after every change, then add a line here. A row that flips from pass to fail is a regression you caught before any user did.

| Date | What changed | Rows that flipped |
| --- | --- | --- |
| | | |
| | | |
| | | |

## When to grow the table

Once real users arrive, stop inventing inputs. The five strangest things they actually sent become rows six to ten, and the chapter "After the ship: watch real use and decide the next move" shows you how to read them out of your log.
