What Should a Pilot Prove Before You Run It?

The pilot is over and the room wants good news. The team lead clicks through three slides of green ticks. Adoption was strong. The feedback was warm. The thing worked.

Then someone senior asks the only question that matters. “So what does this actually prove?”

And the room goes quiet. Not because the pilot failed. Because nobody agreed, before it started, what a good result would be allowed to mean.

This happens constantly. A pilot does well, everyone relaxes, and a modest result quietly grows into a claim it cannot carry. By the time anyone notices, the rollout is funded and the gap is yours to explain.

A pilot is a decision instrument, not a small launch

Most teams treat a pilot as a small version of the real thing. Run it, see how it goes and scale it up if it goes well.

That framing is the whole problem. A pilot is not a rehearsal. It is a way to buy one specific piece of evidence before you commit real money, real headcount or real reputation.

A small launch asks “did it go well?” A decision instrument asks “what will we now do differently, and what evidence makes that safe?”

The difference shows up the moment a pilot succeeds. A small launch that goes well feels like permission to roll out. A decision instrument that goes well tells you exactly which claim you can stand behind and which one you still cannot.

A useful pilot is not one that makes everyone feel better. It is one that prevents people from claiming more than the evidence supports.

The four questions to answer before a pilot starts

Write the answers down before anyone touches the work. Not afterwards, when there is something to defend.

1. What must this pilot prove?

One claim. The single thing that, if true, changes a decision you are about to make. If you cannot write it in a sentence containing the word “whether”, you are not ready. “We want to explore the new intake process” is not a claim. “Whether the new intake process cuts triage time without adding errors” is.

2. What will this pilot not prove?

The list of things people will be tempted to say afterwards that your design cannot support. This is the hardest question and the most valuable one. Writing it now, while the result is unknown, is honest. Writing it after a disappointing result is just an excuse with better grammar.

3. What evidence would change the decision?

Set the bar before you see the data. Decide the number, the signal or the failure that would make you stop, narrow the claim or change course. If every possible result leads to “carry on regardless”, you are not running a pilot. You are running a launch with a softer name.

4. What conditions must be true before we trust the result?

The setup that has to hold for the result to mean anything. Real pricing. Representative users. Normal levels of support. Enough volume. A long enough run. Get these wrong and even a clean result is worthless.

The common proof traps

Most overclaiming comes from the same handful of mistakes. Each one is a place where a pilot feels successful and proves almost nothing useful.

Friendly participants do not prove demand. The three customers in your trial were the three who already liked you. Their enthusiasm is real and it tells you nothing about the buyer who has never heard your name.
Discounted pricing does not prove willingness to pay. Offer a heavy discount to fill the cohort and six of eight renew, and you have proved goodwill at a discount. Finance will hear “75 per cent retention” and build it into the plan. Those are not the same number.
High-touch support does not prove scale. If the person who built the product is answering questions in a private channel within minutes, you are testing that person, not the product.
Low volume does not prove operational readiness. A process that runs cleanly at twenty cases a week with one trained operator is a different process at two hundred cases a week with new joiners.
Untracked outcomes do not prove value. Logins and engagement are easy to measure. Whether escalations actually fell is harder, which is exactly why it gets skipped. Activity is not outcome.
Short pilots do not prove sustained adoption. Four weeks shows novelty. It cannot show whether anyone still uses the thing in week twelve when the newness has worn off.

Here is how it goes wrong in practice. A team pilots a new case-management tool. Twelve hand-picked users. Daily support from the people who built it. A three-week window. Adoption is excellent and everyone is pleased.

Then the proposal lands. Roll it out to four thousand users with a helpdesk that replies in two working days.

Every condition that made the pilot work has just been removed. The hand-picked users, the instant support and the short honeymoon are all gone. The pilot was real. The claim built on top of it is fiction.

What to write down before launch

Capture it on one page before the pilot starts. Call it a Pilot Decision Record. Four parts.

The main claim. The one thing this pilot is built to prove, in a single sentence.
Evidence boundaries. What this pilot will not prove, stated plainly so nobody has to discover it live in a steering group.
Pre-start conditions. The setup that must hold for the result to count. Pricing, participants, support, volume and duration.
Stakeholder wording. The two or three sentences you will actually say when someone asks what it proves. Agreed now, while you are calm, not improvised later when you are on the spot.

Pilots are often waved through under pressure to show movement on an already crowded plate, and that pressure is exactly when these boundaries get skipped. If that is your reality, making the trade-offs visible is a discipline in itself, and one worth getting deliberate about.

Use the Pilot Readiness Review

If writing that record from a blank page feels like work, that is what the Pilot Readiness Review is for.

It asks a short set of structured questions about your pilot: the closest pilot situation, the main claim it needs to prove and the conditions you are running under. Then it produces a short Pilot Decision Record with the four parts already drafted. What the pilot must prove. What it will not prove. The conditions to set before you start. And wording you can take into a stakeholder conversation.

It runs in your browser. Nothing is saved and nothing is sent anywhere, so you can think through a real pilot without putting any confidential detail at risk.

Use the Pilot Readiness Review to create a short Pilot Decision Record before you start.

The strongest part is the section most teams skip. What this pilot will not prove. Done before launch, it is the cheapest insurance you will buy all quarter.

The point is not to run fewer pilots

Pilots are good. Controlled, honest pilots are how careful teams avoid expensive mistakes and learn the things that spreadsheets cannot tell them.

So this is not an argument for caution for its own sake. The point is not to stop pilots. The point is to stop teams using a weak pilot to justify a stronger claim than it can support.

Decide what yours can prove before you run it. Write down what it cannot. Then, whatever the result, you will be ready for the only question that matters when the room goes quiet.

What Should a Pilot Actually Prove Before You Run It?

A pilot is a decision instrument, not a small launch

The four questions to answer before a pilot starts

The common proof traps

What to write down before launch

Use the Pilot Readiness Review

The point is not to run fewer pilots

Pilot Readiness Review

Take one thing into the next conversation.

A pilot is a decision instrument, not a small launch

The four questions to answer before a pilot starts

The common proof traps

What to write down before launch

Use the Pilot Readiness Review

The point is not to run fewer pilots

Pilot Readiness Review

Take one thing into the next conversation.

Practical tools for messy delivery decisions

Related practical guides

How to Choose AI Support for Programme Leadership Work

How to Prepare for a Steering Committee Meeting When You Need a Decision