Free A/B Test Calculator

Enter your control and variant data to instantly check if your A/B test results are statistically significant before shipping.

Check Your A/B Test Results

Total number of visitors in your control group

Number of conversions in your control group

Total number of visitors in your variant group

Number of conversions in your variant group

Higher confidence means stricter criteria for significance

Explore More Free Tools

Other free calculators to help you benchmark and grow.

How It Works

How to use this free A/B test significance calculator

No account needed, no sign-up required. Completely free. Uses a two-proportion z-test to determine whether your variant beat your control with statistical confidence.

1

Enter your control data

Input the number of visitors and conversions for your control group (the original version). Pull these numbers from your testing platform or analytics tool.

2

Enter your variant data

Input the number of visitors and conversions for your variant group (the new version you are testing). Make sure both groups ran during the same time period.

3

Get your significance result

See whether your variant is a statistically significant winner, including the z-score, p-value, and confidence level. Know exactly when you can trust the result.

The Formula

How A/B test significance is calculated

This free A/B test calculator uses a two-proportion z-test to compare the conversion rates of your control and variant groups. Here is the full formula breakdown.

Two-Proportion Z-Test

p1 = conversions_A / visitors_A

p2 = conversions_B / visitors_B

p_pool = (conversions_A + conversions_B) / (visitors_A + visitors_B)

SE = sqrt( p_pool * (1 - p_pool) * (1/visitors_A + 1/visitors_B) )

z = (p2 - p1) / SE

Example: Control 3.0% vs Variant 3.7%, z-score = 2.14, p-value = 0.032 → Significant at 95%

Statistical significance tells you whether the difference between your control and variant is real or just noise caused by random chance. A result is “significant at 95% confidence” when there is less than a 5% probability that the observed difference happened by accident.

The z-score measures how far the observed difference is from zero in terms of standard deviations. A z-score above 1.96 (for a two-tailed test) corresponds to a p-value below 0.05, which means 95% confidence. The higher the z-score, the stronger the evidence that the variant is genuinely different from the control.

Sample size is critical. Small samples produce noisy data, and early results often reverse as more visitors enter the test. If you stop a test too early because the numbers “look good,” you risk making decisions based on random fluctuations. Always determine your required sample size before launching a test, and commit to running until you reach it.

Sample Size Guidelines

How many visitors do you need per variant?

The required sample size depends on how small a change you want to detect. Smaller effects require dramatically more traffic. Use this table to plan your test duration before you start.

Minimum Detectable EffectVisitors Per VariantFeasibility
1%~38,000High-traffic sites only
2%~10,000Most sites
5%~1,600Most sites
10%~400Any site
15%~200Any site
20%~100Any site

Sources: Evan Miller Sample Size Calculator, Optimizely, 2026/2027. Based on 95% confidence level and 80% statistical power with a baseline conversion rate of 3%.

What to A/B Test

The highest-impact elements to test on your site

Not all tests are created equal. Focus on elements that directly influence visitor decisions. Here are the eight most impactful elements to test, ranked by expected return.

ElementExpected ImpactTest DurationKey Metric
HeadlinesHigh2-4 weeksConversion rate
CTA buttonsHigh1-3 weeksClick-through rate
Hero imagesMedium-High2-4 weeksBounce rate
Form lengthHigh2-4 weeksForm completion rate
Pricing layoutVery High3-6 weeksRevenue per visitor
Social proof placementMedium2-3 weeksTrust signals / conversions
Page layoutMedium-High3-4 weeksEngagement / scroll depth
Color schemesLow-Medium2-3 weeksClick-through rate

Common A/B Testing Mistakes

Six mistakes that invalidate your A/B test results

Even well-designed tests fail when these common errors creep in. Avoid these pitfalls to ensure your results are trustworthy and actionable.

⏱️

Ending tests too early

Stopping a test as soon as you see a "winner" is the most common mistake in A/B testing. Early results are unreliable because small sample sizes produce wild fluctuations. A variant that looks 20% better after 200 visitors may show zero lift after 2,000. Always wait until you reach statistical significance and your pre-determined sample size.

#1 most common A/B testing mistake
🎯

Not having a hypothesis

Running random tests without a clear hypothesis wastes time and traffic. Every test should start with a specific prediction: "Changing the CTA from blue to green will increase clicks by 10% because green signals action." A hypothesis gives you a framework for interpreting results, win or lose, and builds institutional knowledge over time.

Tests with hypotheses are 2x more actionable
🔀

Testing too many variables at once

When you change the headline, image, CTA text, and button color all at the same time, you cannot isolate which change drove the result. If the variant wins, you do not know why. If it loses, you may have buried a winning element. Test one variable at a time unless you are running a proper multivariate test with enough traffic.

Isolate one variable per test
📱

Ignoring mobile vs desktop

A variant that wins on desktop may lose on mobile, and vice versa. Over 60% of web traffic is mobile, so aggregate results can hide critical device-level differences. Always segment your A/B test results by device type. A "winning" variant that only works on desktop may actually hurt overall performance.

60%+ of traffic is mobile
📅

Not accounting for seasonality

Running tests over weekends, holidays, or seasonal peaks introduces bias. Visitor behavior on a Monday morning is very different from a Saturday evening. A test that runs only during a sale period will not reflect normal performance. Always run tests for at least one full business cycle (typically 1-2 weeks minimum) to account for natural variation.

Run tests for at least 1 full business cycle
💰

Declaring winners based on revenue alone

Revenue is a lagging indicator with high variance. A single large purchase can skew results and make a losing variant look like a winner. Always pair revenue metrics with leading indicators like conversion rate, click-through rate, and engagement. Look at the full picture before calling a test.

One outlier purchase can skew results 50%+

A/B Testing Best Practices

8 tips for running better A/B tests

These tactics are used by high-performing optimization teams to get reliable, repeatable results from every test. All CommonNinja widgets mentioned below are free to start.

01

Start with high-impact pages

Focus your first A/B tests on pages with the most traffic and the highest revenue impact. Your homepage, top landing pages, and checkout flow offer the biggest return on testing effort. A 5% conversion lift on a page with 50,000 monthly visitors is worth far more than a 20% lift on a page with 500 visitors.

02

Test one variable at a time

Change only one element per test so you can attribute the result to a specific change. If you want to test both the headline and the CTA, run them as separate sequential tests. This discipline builds reliable data and helps you understand what actually drives conversions on your site.

03

Run tests for full business cycles

Let every test run for at least 7 days, ideally 14, to capture weekday and weekend behavior, morning and evening traffic, and any natural fluctuations. Ending a test mid-week or during a promotional period introduces bias that makes your results unreliable.

04

Use popups to test CTAs and offers

Popups are one of the fastest ways to A/B test messaging, offers, and CTAs without redesigning your entire page. Test different headlines, discount amounts, or lead magnets using exit-intent or timed popups. You can validate an offer with a popup before committing to a full page redesign.

Try Popup Builder →
05

Add social proof variants

Test pages with and without social proof elements like customer testimonials, review counts, and trust badges. Social proof consistently lifts conversion rates by 10-30% across industries. Try different formats: text reviews vs video testimonials, star ratings vs written quotes, and different placements on the page.

Try Testimonials →
06

Test urgency with countdown timers

Urgency is a powerful conversion lever, but it needs to be tested carefully. A countdown timer on a limited offer can boost conversions significantly, but fake urgency erodes trust. Test real deadline-based countdowns on your offers and promotions to measure the actual lift without damaging your brand credibility.

Try Countdown Timer →
07

Segment results by device

Always analyze your A/B test results separately for mobile, tablet, and desktop. A variant that wins overall may be losing on mobile, where the majority of your traffic comes from. Device-level segmentation reveals hidden insights that aggregate data obscures.

08

Test gamification elements like spin-to-win

Gamified experiences like spin-to-win wheels can dramatically increase email capture and engagement rates. Test a spinning wheel popup against a standard discount popup to see which drives more conversions. Gamification taps into loss aversion and curiosity, often outperforming static offers by 30% or more.

Try Spinning Wheel →

Metrics Glossary

Key A/B testing terms and what they mean

Understanding these five metrics is essential for interpreting your A/B test results correctly and communicating findings to your team.

TermDefinitionFormula / ThresholdWhen to Use
Statistical SignificanceThe probability that the observed difference between control and variant is not due to random chance.Typically 95% confidence (p < 0.05)Before declaring any A/B test winner or loser
Confidence LevelThe percentage of times the test would produce the same conclusion if repeated with new samples.1 - alpha (usually 95%)Setting up your test parameters before launch
P-ValueThe probability of observing a result at least as extreme as the test result, assuming no real difference exists.p < 0.05 for 95% confidenceEvaluating whether a result is statistically trustworthy
Z-ScoreThe number of standard deviations the observed difference is from zero (no effect).z = (p1 - p2) / SECalculating significance in two-proportion tests
Minimum Detectable EffectThe smallest improvement you want your test to be able to detect reliably.Set before the test based on business impactDetermining required sample size before launching a test

FAQ

Statistical significance means the difference between your control and variant is unlikely to be caused by random chance. At 95% confidence, there is only a 5% probability that the observed difference happened by chance. This free calculator uses a two-proportion z-test to determine significance.
95% is the standard for most A/B tests. Use 99% for high-stakes decisions like pricing changes or checkout flow redesigns. 90% can work for low-risk tests like headline copy or button colors where speed matters more than certainty.
It depends on the size of the effect you want to detect. To detect a 2% improvement with 95% confidence, you need roughly 10,000 visitors per variant. For a 5% improvement, around 1,600 per variant. Larger differences need fewer visitors to confirm.
The most common reasons are insufficient sample size, testing a change that has too small an effect, or ending the test too early. Run your test for at least one full business cycle (usually 1-2 weeks) and make sure you have enough traffic to detect the expected lift.
No, completely free. No account or sign-up required.
The confidence level is the threshold you set before the test (e.g. 95%). The p-value is the actual probability calculated from your data. If the p-value is lower than your threshold (e.g. p < 0.05 for 95% confidence), the result is statistically significant.

Trusted by