Skip to main content

How to Calculate A/B Test Statistical Significance

Determine if your A/B test results are statistically significant with our free A/B Test Calculator. Enter visitors and conversions to get p-value and confidence.

Loading tool...

Steps

1

Enter control (A) data

Enter the number of visitors (or impressions) and the number of conversions for your control variant — the original, unchanged version. Conversions can be any goal event: purchases, sign-ups, clicks, or form completions.

2

Enter variant (B) data

Enter the same metrics for your test variant — the new version with the change you are testing. Both variants must run simultaneously (not sequentially) to avoid time-based confounding.

3

View conversion rates

The calculator shows the conversion rate for each variant (conversions / visitors × 100%) and the relative uplift (how much better or worse variant B performed as a percentage change from control).

4

Check statistical significance

The p-value and confidence level are shown. A p-value below 0.05 (95% confidence) is the conventional threshold for declaring statistical significance — meaning there is less than a 5% probability the observed difference is due to random chance.

5

Interpret the result and decide

If the result is significant AND the uplift is practically meaningful (consider your minimum detectable effect), implement the winning variant. If not significant, continue testing until you reach the required sample size, or abandon the hypothesis.

Understanding p-Values and Confidence Levels

A p-value is the probability of observing a difference as large as (or larger than) the one you observed, assuming the null hypothesis (no difference between A and B) is true. A p-value of 0.05 means there is a 5% chance of seeing this result if A and B are actually identical. The 95% confidence level is simply 1 - 0.05 = 95% — it means you are 95% confident that the difference is real. Common misconception: a 95% confidence level does NOT mean 'B is 95% better than A' or 'there is a 95% chance B will perform better in production'. It is a statement about the test's ability to detect differences, not about the effect size.

Running Valid A/B Tests: Common Mistakes to Avoid

The most common A/B testing mistakes that invalidate results: running A and B sequentially rather than simultaneously (seasonal or day-of-week effects confound results); not splitting traffic randomly (if all mobile users see variant A and all desktop users see variant B, you are testing device type not your change); changing the test while it is running (adding new traffic sources, changing the page for other reasons); testing multiple changes at once and attributing the result to one change; stopping the test too early when the result happens to be significant (peeking); and failing to account for novelty effect (users sometimes convert more with any change simply because it is new). Run multiple A/B tests simultaneously on different page elements, never on overlapping user segments.

Frequently Asked Questions

Related Tools