ipl-logo

• Why Do Some Tests Reach Statistical Significance Quickly And Some Never Get Valid Results?

672 Words3 Pages

Reaching Statistical Significance

When a test disproves your hypothesis, it can be disappointing. While you learn something about your customers, you don’t get the thrill of the win. Still, losing is not nearly as disappointing as a test that never ends. So why do some tests reach statistical significance quickly and some never get valid results?

To understand why some tests never collect enough data, we have to understand the system we use to determine when a test wraps up. We call that system statistical significance. It ensures that we don’t make our decisions based on chance and false positives, but instead on relevant data. At Blue Acorn, we wait until we have 95% statistical significance, which means that if you run the same test 100 …show more content…

Optimizely has made answering both of these questions simple.

To calculate how many visitors must go through a test, you can use Optimizely’s Sample Size Calculator. All you have to do is input two simple variables.

Baseline Conversion Rate: This is the current conversion rate. You can find this number in your analytics.
Minimum Detectable Effect: The minimum relative change in conversion rate you would like to be able to detect. Big changes will take less traffic/time to detect. Small changes are more subtle and could be the result of spikes in the data, so the test needs more traffic/time to verify that the increase isn’t a mistake.

You can also adjust two more variables, statistical power and statistical significance, although it is not necessary. Increasing the statistical power reduces the risk of missing out on a winner. Increasing statistical significance reduces the risk of accidentally picking a winner when one doesn't exist. However, increasing these numbers will increase the time it takes to gather a statistically significant result. For most tests, 80% for statistical power and 95% for statistical significance will work just …show more content…

If it’s longer than most of your other tests, consider performing another test that will finish in less time, unless the potential to increase revenue is worth the wait. Otherwise, come back to it when you’re low on testing ideas.

For a more detailed explanation on the Sample Size Calculator, check out Optimizely’s article “How Long to Run a Test.”

Enough visitors and still no difference

Sometimes, despite the fact that you reached enough visitors to gather a statistically significant result, you don’t find a winner. That means the difference between the original and variation is smaller than your minimum detectable effect. If you set your minimum detectable effect to a high percentage, you may want to recalculate with a smaller percentage. However, if you needed a big improvement in your KPI to justify the costs of the variation, you may want to stop the test and move on to the next one.

Of course, the best way to get a statistically significant result is to develop a solid hypothesis. Hypotheses based on assumptions, pulled out of thin air, or simply copied from another site have a much higher risk of failing. Make sure you have substantial research that points to a possible improvement before testing. You’ll save yourself the trouble of a test that never

More about • Why Do Some Tests Reach Statistical Significance Quickly And Some Never Get Valid Results?

Open Document