Statistical Concepts

Post-Hoc Tests: Tukey, Bonferroni, and Scheffé Compared

6 min read

Learn what post-hoc tests are, how Tukey, Bonferroni, and Scheffé methods compare, and when to use each for pairwise group comparisons.

What Are Post-Hoc Tests?

Post-hoc tests are follow-up pairwise comparison procedures used after an ANOVA reveals a significant overall difference among three or more group means. ANOVA tells you that at least one group differs from the others, but it doesn't tell you which specific groups are different. Post-hoc tests fill that gap by comparing every pair of groups while controlling the overall Type I error rate, the probability of finding at least one false positive across all comparisons. The term "post-hoc" (Latin for "after this") reflects the fact that these comparisons are conducted after the omnibus ANOVA test, not planned in advance. The three most common post-hoc procedures are Tukey's HSD, Bonferroni, and Scheffé, each with different strengths and appropriate use cases.

Why Post-Hoc Tests Matter

Without post-hoc correction, comparing four groups requires six pairwise tests, each at α = 0.05. The probability of at least one false positive jumps to about 26%. With ten groups, you'd make 45 comparisons and the false-positive risk exceeds 90%. Post-hoc tests keep this familywise error rate at your chosen alpha level, ensuring that when you report a significant difference between two groups, you can trust it.

How Post-Hoc Tests Work

Tukey's HSD (Honestly Significant Difference)

Tukey's HSD is the most popular post-hoc test for comparing all possible pairs of means. It uses the studentized range distribution to calculate a critical difference:

HSD = q × √(MS_within / n)

Where q is the studentized range critical value (from Tukey's table, based on the number of groups and error degrees of freedom), MS_within is the mean square error from the ANOVA, and n is the sample size per group.

Two means are significantly different if their absolute difference exceeds the HSD value.

Best for: Comparing all pairs when group sizes are equal or approximately equal. It's the default choice in most research.

Bonferroni Correction

The Bonferroni method divides the alpha level by the number of comparisons. If you're making 6 comparisons at α = 0.05:

Adjusted α = 0.05 / 6 = 0.0083

Each pairwise comparison (using a standard t-test) must meet this stricter threshold to be declared significant.

Best for: Situations where you're making a small number of planned or post-hoc comparisons. It's more conservative than Tukey when the number of comparisons is large.

Scheffé's Method

Scheffé's test is the most conservative of the three but also the most flexible. It controls the familywise error rate for all possible contrasts, not just pairwise comparisons, but also complex comparisons like "is the average of groups A and B different from group C?"

The critical value is based on the F-distribution:

Critical value = (k - 1) × F_critical

Where k is the number of groups and F_critical comes from the ANOVA F-distribution.

Best for: When you want to test complex contrasts (not just pairs) or when the comparison wasn't planned before data collection.

Worked Example

You compare satisfaction scores across four service levels (Basic, Standard, Premium, Enterprise) with 40 customers each. ANOVA is significant: F(3, 156) = 12.4, p < 0.001.

Group means: Basic = 62, Standard = 68, Premium = 74, Enterprise = 76. MS_within = 120.

Tukey's HSD results:

Comparison Diff Significant?
Basic vs. Standard 6 Yes (p = 0.018)
Basic vs. Premium 12 Yes (p < 0.001)
Basic vs. Enterprise 14 Yes (p < 0.001)
Standard vs. Premium 6 Yes (p = 0.018)
Standard vs. Enterprise 8 Yes (p = 0.002)
Premium vs. Enterprise 2 No (p = 0.741)

Premium and Enterprise don't differ significantly from each other, but both differ from Basic and Standard.

Comparison Table

Feature Tukey HSD Bonferroni Scheffé
Type of comparisons All pairwise Selected or all pairwise All possible contrasts
Conservatism Moderate Moderate to high Most conservative
Equal group sizes required? Ideal, but variants exist for unequal No No
Power for pairwise Highest Good with few comparisons Lowest
Complex contrasts? No No Yes

Other Post-Hoc Options

  • Games-Howell: Use when group variances are unequal (doesn't assume homogeneity of variance)
  • Dunnett's test: Use when comparing multiple treatment groups against a single control group
  • Fisher's LSD: The least conservative, only appropriate after a significant omnibus F with exactly three groups

When to Use Post-Hoc Tests

  • After a significant one-way ANOVA to determine which specific groups differ
  • Concept testing with three or more concepts to identify which outperform and which underperform
  • Segment comparison when you've identified four or more customer segments and need to know which differ on key metrics
  • Experimental designs comparing multiple treatment conditions on a continuous outcome
  • Pricing tier analysis to determine which price points produce significantly different willingness-to-pay or purchase intent

Common Mistakes to Avoid

  • Running post-hoc tests without a significant omnibus ANOVA: the post-hoc procedure assumes the overall F-test was significant first
  • Choosing the most liberal test to get significant results: pick your test based on your design and assumptions, not the output
  • Forgetting to check equal-variance assumptions: Tukey and Bonferroni assume homogeneity of variance; switch to Games-Howell if variances differ substantially

How Quali-Fi Supports Post-Hoc Comparisons

Quali-Fi's platform automatically applies appropriate post-hoc tests when cross-tabulations or segment comparisons involve three or more groups, flagging significant pairwise differences directly in the results table. The Research plan ($1,061/month) lets you choose between Tukey, Bonferroni, and other correction methods based on your design.

See Quali-Fi's group comparison tools

Frequently Asked Questions

Should I use Tukey or Bonferroni?

Use Tukey when comparing all pairs of means, it's designed for that and has more statistical power. Use Bonferroni when you're only testing a few specific comparisons out of many possible pairs, or when you need a simple, widely understood correction method.

Can I use post-hoc tests with non-parametric analyses?

Yes. After a significant Kruskal-Wallis test, you'd use Dunn's test (with Bonferroni correction) for pairwise comparisons. After a significant Friedman test, you'd use Wilcoxon signed-rank tests with Bonferroni correction.

Do I need post-hoc tests if I only have two groups?

No. With only two groups, the ANOVA (or t-test) result directly tells you which group is higher. Post-hoc tests are only needed with three or more groups.

Frequently Asked Questions

Related Guides

Put it into practice

Ready to apply this in your research?

Quali-Fi makes it easy to run surveys, conjoint studies, and more, all in one platform.