Statistical Concepts

Statistical Power: What It Is and How to Achieve 80% Power

6 min read

Learn what statistical power is, why 80% is the standard minimum, and which factors affect your study's ability to detect real effects.

What Is Statistical Power?

Statistical power is the probability that a study will correctly detect a real effect when one exists. Formally, power = 1 - β, where β is the Type II error rate (the probability of missing a real effect). A study with 80% power has an 80% chance of producing a statistically significant result if the hypothesized effect is real, and a 20% chance of missing it. Power is determined before data collection through a power analysis that considers your alpha level, expected effect size, sample size, and the statistical test you plan to use. An underpowered study, one with power below 0.80, is problematic because it's likely to produce null results even when something real is happening, wasting the time, money, and participant effort that went into the research.

Why Statistical Power Matters

Running an underpowered study is like fielding a survey with a broken questionnaire, you'll get data, but it won't reliably answer your question. In market research, low power means you might declare that two product concepts perform equally when one actually outperforms the other. You'd make the wrong business decision not because of bad data, but because you didn't collect enough of it. Power analysis is the tool that prevents this.

How Statistical Power Works

The Four Factors

Power depends on four interconnected variables. Knowing any three lets you solve for the fourth:

  1. Sample size (n): More data → more power. This is the factor researchers have the most control over.
  2. Effect size (d, r, or w): Larger effects are easier to detect. A huge difference between groups requires fewer observations to spot.
  3. Alpha level (α): A more lenient alpha (0.10 vs. 0.05) gives more power but increases the false positive risk.
  4. Variability (σ): Less noise in the data means the signal stands out more, increasing power.

Power Analysis Formula

For a two-sample t-test, the required sample size per group is approximately:

n = [(z_α/2 + z_β)² × 2σ²] / δ²

Where z_α/2 is the critical z-value for your alpha level, z_β is the z-value for your desired power, σ is the standard deviation, and δ is the minimum effect size you want to detect.

For α = 0.05 (z = 1.96) and power = 0.80 (z = 0.84):

n = [(1.96 + 0.84)² × 2σ²] / δ² = (7.84 × 2σ²) / δ²

Worked Example

You want to detect a 3-point difference in satisfaction scores (measured on a 1-50 scale) between two customer segments. Historical data suggests σ = 10.

n = [(1.96 + 0.84)² × 2(10²)] / 3²

n = [7.84 × 200] / 9

n = 1,568 / 9 = 174 per group

You need about 174 customers per segment (348 total) for 80% power to detect a 3-point difference.

What if you want 90% power? Replace z_β = 0.84 with z_β = 1.28:

n = [(1.96 + 1.28)² × 200] / 9 = [10.50 × 200] / 9 = 233 per group

Going from 80% to 90% power increases the required sample by about 34%.

Effect Size Conventions

Cohen's conventions for common effect sizes:

Effect Size Small Medium Large
d (mean difference) 0.20 0.50 0.80
r (correlation) 0.10 0.30 0.50
w (chi-square) 0.10 0.30 0.50
f (ANOVA) 0.10 0.25 0.40

Required sample sizes per group for a two-sample t-test at α = 0.05, power = 0.80:

Effect Size d n per Group
0.20 (small) 394
0.50 (medium) 64
0.80 (large) 26

Small effects require enormous samples. This is why defining a meaningful minimum effect size is critical.

Why 0.80?

Cohen proposed 0.80 as the minimum acceptable power, based on the argument that a 4:1 ratio of β to α (0.20 to 0.05) is reasonable for most research. This means you consider a false negative about four times less serious than a false positive. Many funding agencies and review boards now require power ≥ 0.80.

However, for high-stakes decisions, 0.90 or 0.95 power may be warranted. The additional sample cost is usually modest compared to the cost of missing a real effect.

Common Power Analysis Mistakes

Running power analysis after the study (post-hoc power): This is meaningless. Post-hoc power is a direct mathematical transformation of the p-value, if p = 0.05, post-hoc power is approximately 50%. It tells you nothing you didn't already know from the p-value itself. Power analysis is only useful prospectively.

Using unrealistic effect sizes: Choosing a "large" effect size because it requires fewer participants, when the true effect is likely small, guarantees an underpowered study. Base your effect size on prior research, pilot data, or the smallest effect that would be practically meaningful.

Factors That Increase Power

Action Effect on Power
Increase sample size Increases
Use a larger alpha Increases (but raises Type I error)
Use a one-tailed test Increases (if direction is justified)
Reduce measurement error Increases
Use a within-subjects design Increases (eliminates between-subject variance)
Use covariates (ANCOVA) Increases (reduces error variance)

When to Conduct a Power Analysis

  • Before every study: power analysis should be part of your research design, not an afterthought
  • When planning A/B tests to determine how long the test needs to run
  • During grant or budget proposals to justify sample size requirements
  • When evaluating past research to understand whether null results might reflect inadequate power rather than absent effects

Common Mistakes to Avoid

  • Skipping power analysis entirely and using whatever sample size is convenient, this leads to a mix of overpowered (wasteful) and underpowered (uninformative) studies
  • Confusing statistical significance with adequate power: a significant result can come from an underpowered study (it just means you got lucky), and it doesn't validate the study design retroactively
  • Ignoring the multiple-testing impact on power: Bonferroni and other corrections effectively reduce alpha for each individual test, which reduces power per test; account for this in your sample size calculation

How Quali-Fi Supports Power Analysis

Quali-Fi's Research plan ($1,061/month) includes built-in power analysis calculators for common designs, comparing means, comparing proportions, correlation detection, and ANOVA. Enter your desired alpha, power, and expected effect size, and the platform returns the required sample size with visual tradeoff curves. For complex designs, the Intelligence tier provides custom power analyses from methodological consultants.

Calculate your required sample size with Quali-Fi

Frequently Asked Questions

Can a study have too much power?

Technically, more power is always better for detecting effects. But extremely high power (e.g., 99.9%) means you'll detect even trivially small effects that have no practical significance. If your sample is so large that a 0.1-point difference on a 100-point scale is "statistically significant," you're detecting noise that nobody should act on. Balance power with effect size relevance.

What if I can't afford the sample size my power analysis requires?

You have three options: (1) accept lower power and acknowledge the study may be inconclusive, (2) increase your minimum detectable effect size, if you can only detect large effects, focus on situations where large effects are expected, or (3) reduce variability through better measurement, within-subjects designs, or covariates.

Is 80% power really enough?

It depends on the stakes. At 80% power, you'll miss 1 in 5 real effects. For exploratory research, that's acceptable. For decisions involving large financial commitments or irreversible actions, aim for 90% or higher. The difference in sample size between 80% and 90% power is typically 25-35%, often a worthwhile investment.

Frequently Asked Questions

Related Guides

Put it into practice

Ready to apply this in your research?

Quali-Fi makes it easy to run surveys, conjoint studies, and more, all in one platform.