What Is Homogeneity of Variance?
Homogeneity of variance (also called homoscedasticity) is the assumption that different groups being compared in a statistical test have approximately equal variances, equal spread in their data. If you're comparing satisfaction scores across three customer segments using ANOVA, homogeneity of variance means the variability of scores within each segment should be roughly the same. When this assumption holds, the standard versions of t-tests and ANOVA produce reliable results. When it's violated (called heteroscedasticity), those tests can give inaccurate p-values, leading you to either miss real effects or falsely detect effects that don't exist.
Why Homogeneity of Variance Matters
The standard t-test and ANOVA pool variances across groups to create a single estimate of within-group variability. If one group's variance is much larger than another's, the pooled estimate is wrong, it overestimates precision for the high-variance group and underestimates it for the low-variance group. The result is distorted test statistics and unreliable p-values. This isn't a theoretical concern: unequal variances combined with unequal group sizes can inflate the Type I error rate (false positives) well above the nominal 5% level.
How Homogeneity of Variance Works
Levene's Test
Levene's test is the most commonly used method to assess homogeneity of variance. It tests the null hypothesis that all groups have equal population variances.
How it works:
For each observation, calculate the absolute deviation from the group mean (or median): dᵢⱼ = |Xᵢⱼ - Median_j|
Run a one-way ANOVA on these absolute deviations
If the F-statistic from this ANOVA is significant (p < 0.05), reject the assumption of equal variances
The median-based version (Brown-Forsythe variant) is preferred because it's more strong to non-normal data than the mean-based version.
Worked Example
You're comparing purchase intent scores (1-10 scale) across three ad concepts:
| Concept A | Concept B | Concept C | |
|---|---|---|---|
| Mean | 6.2 | 6.5 | 6.8 |
| SD | 1.4 | 1.6 | 3.2 |
| n | 100 | 100 | 100 |
Concept C has a standard deviation more than twice that of Concept A. Running Levene's test produces F(2, 297) = 14.8, p < 0.001. The assumption of equal variances is violated.
Looking at the data, Concept C's higher variance makes sense, it's a polarizing ad that some respondents love and others dislike, while Concepts A and B generate more uniform responses.
What to Do When Variances Are Unequal
For two-group comparisons (t-test):
Use Welch's t-test instead of the standard (Student's) t-test. Welch's test doesn't assume equal variances, it adjusts the degrees of freedom based on each group's variance and size. Most modern statistical software defaults to Welch's t-test or offers it as an option. There's essentially no downside to always using Welch's test; it performs nearly identically to Student's t-test when variances are equal and much better when they're not.
For multi-group comparisons (ANOVA):
Use Welch's ANOVA (also called Welch's F-test), which adjusts for unequal variances similarly to Welch's t-test. Alternatively, use the Brown-Forsythe F-test, which uses a different degrees-of-freedom correction. Both are available in R, SPSS, and most statistical packages.
For post-hoc comparisons after Welch's ANOVA, use Games-Howell post-hoc tests instead of Tukey's HSD, since Games-Howell doesn't assume equal variances.
For regression:
Use heteroscedasticity-consistent standard errors (HC standard errors, also called strong standard errors or White's standard errors). These adjust the standard errors without changing the coefficient estimates.
Rules of Thumb
How much variance inequality is too much? Common guidelines:
- Ratio of largest to smallest variance < 2:1: usually not a problem
- Ratio between 2:1 and 4:1: may be problematic, especially with unequal group sizes; test with Levene's
- Ratio > 4:1: likely a problem; use Welch's test or a nonparametric alternative
The interaction between unequal variances and unequal group sizes matters most. If the larger group has the larger variance, the standard test becomes conservative (less likely to find significance). If the smaller group has the larger variance, the test becomes liberal (more likely to produce false positives).
Other Tests for Equal Variances
| Test | Strengths | Weaknesses |
|---|---|---|
| Levene's test (median) | strong to non-normality | May lack power with small samples |
| Bartlett's test | More powerful when data is normal | Very sensitive to non-normality |
| F-test for two groups | Simple | Extremely sensitive to non-normality |
| Fligner-Killeen test | Nonparametric, strong | Less commonly available in software |
Levene's test (median-based) is the recommended default because it balances robustness and power.
When to Use Homogeneity of Variance Tests
- Before running ANOVA to verify the equal variance assumption and decide whether standard or Welch's ANOVA is appropriate
- Before independent-samples t-tests to choose between Student's and Welch's versions
- In regression diagnostics to check for heteroscedasticity in residual plots
- When comparing groups of very different sizes: this is when unequal variances cause the most damage to standard tests
Common Mistakes to Avoid
- Treating a non-significant Levene's test as proof of equal variances: it just means you didn't detect a significant difference; with small samples, Levene's test has low power and might miss substantial variance differences
- Using Bartlett's test with non-normal data: Bartlett's test is highly sensitive to non-normality and will reject equal variances simply because the data isn't normal, even if variances are actually equal
- Abandoning parametric tests entirely when variances are unequal: Welch's corrections are straightforward and widely available; you don't need to switch to nonparametric methods just because Levene's test is significant
How Quali-Fi Supports Variance Testing
Quali-Fi's cross-tabulation module automatically runs Levene's test when comparing continuous metrics across groups. When unequal variances are detected, the platform switches to Welch-corrected tests and Games-Howell post-hoc comparisons, reporting the appropriate adjusted p-values without requiring manual intervention.
Compare groups accurately with Quali-Fi
Frequently Asked Questions
Should I always run Levene's test before ANOVA?
It's good practice, but an even better approach is to always use Welch's ANOVA by default. Welch's ANOVA performs nearly as well as standard ANOVA when variances are equal and substantially better when they're not. Some statisticians argue that routinely using Welch's test eliminates the need for preliminary variance testing altogether.
What if Levene's test is significant but the variance ratio is small?
With large samples, Levene's test can detect trivially small differences in variance that don't meaningfully affect your analysis. If the ratio of the largest to smallest variance is under 2:1, the practical impact on standard tests is minimal, even if Levene's test is statistically significant. Use judgment, statistical significance doesn't always mean practical significance.
Does homogeneity of variance apply to paired/repeated-measures designs?
For repeated-measures ANOVA, the relevant assumption is sphericity (tested with Mauchly's test), which is about equal variances of the differences between conditions. It's a related but distinct concept. If sphericity is violated, use the Greenhouse-Geisser or Huynh-Feldt correction.