Statistical Concepts

Homogeneity of Variance: What It Is, Levene's Test, and Why It Matters for ANOVA

6 min read

Learn what homogeneity of variance is, how to test it with Levene's test, and what to do when groups have unequal variances in t-tests and ANOVA.

What Is Homogeneity of Variance?

Homogeneity of variance (also called homoscedasticity) is the assumption that different groups being compared in a statistical test have approximately equal variances, equal spread in their data. If you're comparing satisfaction scores across three customer segments using ANOVA, homogeneity of variance means the variability of scores within each segment should be roughly the same. When this assumption holds, the standard versions of t-tests and ANOVA produce reliable results. When it's violated (called heteroscedasticity), those tests can give inaccurate p-values, leading you to either miss real effects or falsely detect effects that don't exist.

Why Homogeneity of Variance Matters

The standard t-test and ANOVA pool variances across groups to create a single estimate of within-group variability. If one group's variance is much larger than another's, the pooled estimate is wrong, it overestimates precision for the high-variance group and underestimates it for the low-variance group. The result is distorted test statistics and unreliable p-values. This isn't a theoretical concern: unequal variances combined with unequal group sizes can inflate the Type I error rate (false positives) well above the nominal 5% level.

How Homogeneity of Variance Works

Levene's Test

Levene's test is the most commonly used method to assess homogeneity of variance. It tests the null hypothesis that all groups have equal population variances.

How it works:

  1. For each observation, calculate the absolute deviation from the group mean (or median): dᵢⱼ = |Xᵢⱼ - Median_j|

  2. Run a one-way ANOVA on these absolute deviations

  3. If the F-statistic from this ANOVA is significant (p < 0.05), reject the assumption of equal variances

The median-based version (Brown-Forsythe variant) is preferred because it's more strong to non-normal data than the mean-based version.

Worked Example

You're comparing purchase intent scores (1-10 scale) across three ad concepts:

Concept A Concept B Concept C
Mean 6.2 6.5 6.8
SD 1.4 1.6 3.2
n 100 100 100

Concept C has a standard deviation more than twice that of Concept A. Running Levene's test produces F(2, 297) = 14.8, p < 0.001. The assumption of equal variances is violated.

Looking at the data, Concept C's higher variance makes sense, it's a polarizing ad that some respondents love and others dislike, while Concepts A and B generate more uniform responses.

What to Do When Variances Are Unequal

For two-group comparisons (t-test):

Use Welch's t-test instead of the standard (Student's) t-test. Welch's test doesn't assume equal variances, it adjusts the degrees of freedom based on each group's variance and size. Most modern statistical software defaults to Welch's t-test or offers it as an option. There's essentially no downside to always using Welch's test; it performs nearly identically to Student's t-test when variances are equal and much better when they're not.

For multi-group comparisons (ANOVA):

Use Welch's ANOVA (also called Welch's F-test), which adjusts for unequal variances similarly to Welch's t-test. Alternatively, use the Brown-Forsythe F-test, which uses a different degrees-of-freedom correction. Both are available in R, SPSS, and most statistical packages.

For post-hoc comparisons after Welch's ANOVA, use Games-Howell post-hoc tests instead of Tukey's HSD, since Games-Howell doesn't assume equal variances.

For regression:

Use heteroscedasticity-consistent standard errors (HC standard errors, also called strong standard errors or White's standard errors). These adjust the standard errors without changing the coefficient estimates.

Rules of Thumb

How much variance inequality is too much? Common guidelines:

  • Ratio of largest to smallest variance < 2:1: usually not a problem
  • Ratio between 2:1 and 4:1: may be problematic, especially with unequal group sizes; test with Levene's
  • Ratio > 4:1: likely a problem; use Welch's test or a nonparametric alternative

The interaction between unequal variances and unequal group sizes matters most. If the larger group has the larger variance, the standard test becomes conservative (less likely to find significance). If the smaller group has the larger variance, the test becomes liberal (more likely to produce false positives).

Other Tests for Equal Variances

Test Strengths Weaknesses
Levene's test (median) strong to non-normality May lack power with small samples
Bartlett's test More powerful when data is normal Very sensitive to non-normality
F-test for two groups Simple Extremely sensitive to non-normality
Fligner-Killeen test Nonparametric, strong Less commonly available in software

Levene's test (median-based) is the recommended default because it balances robustness and power.

When to Use Homogeneity of Variance Tests

  • Before running ANOVA to verify the equal variance assumption and decide whether standard or Welch's ANOVA is appropriate
  • Before independent-samples t-tests to choose between Student's and Welch's versions
  • In regression diagnostics to check for heteroscedasticity in residual plots
  • When comparing groups of very different sizes: this is when unequal variances cause the most damage to standard tests

Common Mistakes to Avoid

  • Treating a non-significant Levene's test as proof of equal variances: it just means you didn't detect a significant difference; with small samples, Levene's test has low power and might miss substantial variance differences
  • Using Bartlett's test with non-normal data: Bartlett's test is highly sensitive to non-normality and will reject equal variances simply because the data isn't normal, even if variances are actually equal
  • Abandoning parametric tests entirely when variances are unequal: Welch's corrections are straightforward and widely available; you don't need to switch to nonparametric methods just because Levene's test is significant

How Quali-Fi Supports Variance Testing

Quali-Fi's cross-tabulation module automatically runs Levene's test when comparing continuous metrics across groups. When unequal variances are detected, the platform switches to Welch-corrected tests and Games-Howell post-hoc comparisons, reporting the appropriate adjusted p-values without requiring manual intervention.

Compare groups accurately with Quali-Fi

Frequently Asked Questions

Should I always run Levene's test before ANOVA?

It's good practice, but an even better approach is to always use Welch's ANOVA by default. Welch's ANOVA performs nearly as well as standard ANOVA when variances are equal and substantially better when they're not. Some statisticians argue that routinely using Welch's test eliminates the need for preliminary variance testing altogether.

What if Levene's test is significant but the variance ratio is small?

With large samples, Levene's test can detect trivially small differences in variance that don't meaningfully affect your analysis. If the ratio of the largest to smallest variance is under 2:1, the practical impact on standard tests is minimal, even if Levene's test is statistically significant. Use judgment, statistical significance doesn't always mean practical significance.

Does homogeneity of variance apply to paired/repeated-measures designs?

For repeated-measures ANOVA, the relevant assumption is sphericity (tested with Mauchly's test), which is about equal variances of the differences between conditions. It's a related but distinct concept. If sphericity is violated, use the Greenhouse-Geisser or Huynh-Feldt correction.

Frequently Asked Questions

Related Guides

Put it into practice

Ready to apply this in your research?

Quali-Fi makes it easy to run surveys, conjoint studies, and more, all in one platform.