In a previous post, I provided a downloadable A/B testing significance calculator (in excel). In this post, I will provide a calculator which lets you estimate how many days should you run a test in order to obtain statistically significant results. But, first, a disclaimer.
When someone asks how long should s/he run an A/B test, the ideal answer would be until eternity or till the time you get results (whichever is sooner). In an A/B test, you can never say with full confidence that you will get statistically significant results after running the test X number of days. Instead, what you can say is that there is 80% (or 95%, whatever you choose) probability of getting statistically significant result (if it indeed exists) after X number of days. But, of course, it may be the case that there is in fact no difference in performance of control and variation so no matter how long you wait, you will never get a statistically significant result.
Download and use the calculator below to find out how many visitors you need to include in the test. There are 4 pieces of information that you need to enter:
Once you enter these 4 parameters, the calculator below will find out how many visitors you need to test (for 80% and 95% probability of finding the result). You can stop the test after you test those many visitors, but you should never stop earlier than that. You may end up concluding wrong results.
Click below to download the calculator:
Download A/B testing duration calculator.
Please feel free to share the file with your friends and colleagues or post it on your blog / twitter.
PS: By the way, if you want to do quick calculations, we have a version of this calculator hosted on Google Docs (please make a copy of the Google Doc sheet into your own account before you make any changes to it).
Ah! The million dollar calculator. Explaining how it works is beyond the scope of this post as it is too technical (maybe a separate post). But, if you have got stomach for it, below is gist of how we calculate number of visitors needed to get significant results.
The graph above is taken from an excellent book called Statistical Rules of Thumb. Luckily, the chapter on estimating sample size is available to download freely [PDF]. Another excellent source to get more information on sample size estimation for A/B testing is Microsoft’s paper: Controlled Experiments on the Web: Survey and Practical Guide [PDF].
Hope you like the calculator and related resources. Excited to know your feedback and comments!
RSS feed for comments on this post. TrackBack URL

Great addition to the significance calculator! Thanks guys.
Comment by Serge Doubinski — April 26, 2011 @ 11:51 pm
Very useful calculator, thanks! Minor note: maybe you should change the text from “expected” improvement in conversion to, say, “targeted” improvement in conversion? You don’t really know what to expect when you’re still setting up the test.
Comment by Ana — April 27, 2011 @ 12:11 am
@Ana: it is actually the change in conversion you want to detect. But I agree it can be phrased in a better manner.
Comment by Paras Chopra — April 27, 2011 @ 12:39 am
Is there a problem with the calculator?
I entered 1% in Expected Improvement in Conversion Rate, and it told me 19200 number of days for 85%.
I entered 50% in Expected Improvement in Conversion Rate, and it told me 8 days for 85%.
Isn’t it much easier to improve 1% in conversion rate than 50%? Why is the calculator showing otherwise?
Comment by tyler — April 27, 2011 @ 12:09 pm
Looking through the chapter you linked to, and the excel document, I can’t identify which sample size calculation you are using. Which one are you using?
Comment by Phillip — April 29, 2011 @ 12:19 am
@Phillip: it is actually the one which says 16*(std/diff)^2
@Tyler: if you want to detect 50% or more change, you will require less traffic as compared to the situation where you want to detect even 2% change in conversion.
Comment by Paras Chopra — April 29, 2011 @ 12:21 am
So it is a rearrangement of (2.3), using the substitutions in (2.4), and with power set to .95?
Comment by Phillip — April 29, 2011 @ 1:14 am
@Phillip: the power is 80%, the confidence level is 95%.
Comment by Paras Chopra — April 29, 2011 @ 1:17 am
In the Excel document, the formula uses 26 where I would have expected to see 16. The table at the top of page 30 (4 of 25 in the pdf) indicates that 26 is the number to use when power of .95 is desired. The entire table is for a confidence level of 95%.
Comment by Phillip — April 29, 2011 @ 1:31 am
Philip: there are two rows in the excel. 16 corresponds to 80% power. 26 corresponds to 95% power.
Comment by Paras Chopra — April 29, 2011 @ 1:45 am
Right, the upper formula (J8) uses 16 (corresponding to power of .80), the lower formula (J9) uses 26 (corresponding to power of .95).
Should (D7*D8) just be (D8)? It looks like this should be the difference between the means under the null and alternative hypothesis.
I’m struggling with this because it doesn’t give me anything close to the number I get using the standard sample size calculation (n=p(1-p)z^2/e^2). When I try using the formula (2.27) I get an answer that is closer to what I am expecting, but also very different from what I am expecting to see. So I’m trying to figure out where the differences are between these various methods of calculating the same thing.
It seems like the standard formula is simple enough that if we are using Excel to calculate the sample size, we may as well use it.
Thanks for the continued responses, they have been helpful.
Comment by Phillip — April 29, 2011 @ 2:26 am
Paras, I’m not quite understanding here. Why would it be harder to detect a small change (1%) over a huge change(50%)?
Comment by tyler — April 29, 2011 @ 6:02 am
@Tyler: Detecting a small change (1%) that is statistically significant requires a lot of traffic. That is because when you are collecting data, conversion rate has a range: say 6% +/- 2%. To make this range narrower and to have better estimate of true conversion rate, you need more data. In essence, by saying you want to detect even 1% change you are saying that you want to detect statistically significant different so range has to be very, very small. Something like 6% =/- 0.001%
When we want to detect a larger difference, then it doesn’t matter if your range is larger. So, if your control is 6% +/- 2% and your variation is 9% +/- 1%, you are still able to detect a huge change 50% change.
Comment by Paras Chopra — April 29, 2011 @ 12:11 pm
@Phillip: It is D7*D8 because you are calculating delta here. So, it is actually Mean * % change in mean (this gives us the delta). You are trying to calculate only with mean which is not correct. I hope I have clarified. Feel free to reach me at paras@wingify.com
Comment by Paras Chopra — April 29, 2011 @ 12:14 pm
Hy Paras,
Question when I use the tool with these numbers 500 visitors, conversion 20% and uplift expected 10% then it takes 42 days to get significant results. When I use your signification tool and add these numbers (5000 visitors and conversions 1000) and for te variation (5000 visitors and conversions 1100) there is already a significant result. Based on the last tool it whould only take 10 days to complete the test. I came up with this because the GWO (Google) calculator gave me 10 days and your tool 42 days. I hope you get my point
Comment by Jan de Vries — May 9, 2011 @ 10:46 pm
Hy Paras can you explain why this tool gives me back 42 days when i use these numbers (500 visits, 20% conversion and an uplift of 10%) and the calculator of Google 10 days.
When I use the numbers in the signification tool of VWO then I also get significant results after 10 days. I’m lost
Comment by Jan de Vries — May 10, 2011 @ 1:48 pm
Hi, I run numbers on 2 duration calculators (google website optimizer calculator and yours) and got significantly different results. Can you please explain why?
Comment by jan — May 13, 2011 @ 3:54 pm
@Jan: Don’t know about GWO calculator (they may be using different confidence level and power).
Comment by Paras Chopra — May 13, 2011 @ 5:25 pm
[...] for that period of time. To do this you can use tools such as Visual Website Optimizer’s calculator, but keep in mind that these should be taken as guidelines and not [...]
Pingback by Understanding How Long Your A/B Test Is Going To Take :: Ninja Otter — August 24, 2011 @ 11:07 pm
If I’m only testing one change from the Control (such as Control has a headline & Test page doesn’t), are the number of variations in the test equal to 1 or 2?
Knowing this makes a big difference in testing days..basically doubles it according to your formula.
Thank you again for your work, I feel smarter already!
Comment by Quan Pham — January 5, 2012 @ 8:51 am
@Quan: the number of variations will be two. One is control and one is the actual variation you are testing.
Comment by Paras Chopra — January 5, 2012 @ 1:26 pm
@Paras: Thank you for the quick reply, I greatly appreciate it!
Comment by Quan Pham — January 5, 2012 @ 11:31 pm