Causal Inference & Experimentation
Your team ships a new onboarding flow and the test shows +12% signups on day three. Do you roll it out? Most product A/B tests are read wrong, stopped too early, under-powered, or shipped on a "win" too small to matter. Here's the discipline that turns a test into a decision you can defend, on a flow with a 12% baseline signup rate.
You can't decide duration after you peek; you commit before. The smaller the effect you want to catch, the more traffic you need, and it grows with the square of how small. Pick the minimum lift worth shipping and your traffic; the test tells you the sample size and runtime (80% power, 5% false-positive rate).
Here's the trap that ruins more product tests than any other. These are A/A tests, both arms are identical, so the true effect is zero. Yet if you check every day and ship the moment p < 0.05, you'll declare a fake winner shockingly often. Each line is one A/A test's running p-value; the more times you look, the more chances to get unlucky.
A test answers two different questions. Is the effect real? (does the interval clear zero?) and is it big enough to matter? (does it clear your ship bar?). A tiny, certain win and a huge, uncertain one are different decisions. Move the true effect and the sample size and watch the verdict change.
The last mile is business, not statistics: translate the lift (and its uncertainty) into the outcome leadership cares about, then make the call. Same test result as above; set how much traffic hits this flow in a year.
What a typical engagement looks like, end to end.