Post-Hoc Tests: Tukey, Bonferroni, and Scheffé Compared

Q: Should I use Tukey or Bonferroni?

Use Tukey when comparing all pairs of means, it's designed for that and has more statistical power. Use Bonferroni when you're only testing a few specific comparisons out of many possible pairs, or when you need a simple, widely understood correction method.

Q: Can I use post-hoc tests with non-parametric analyses?

Yes. After a significant Kruskal-Wallis test, you'd use Dunn's test (with Bonferroni correction) for pairwise comparisons. After a significant Friedman test, you'd use Wilcoxon signed-rank tests with Bonferroni correction.

Q: Do I need post-hoc tests if I only have two groups?

No. With only two groups, the ANOVA (or t-test) result directly tells you which group is higher. Post-hoc tests are only needed with three or more groups.

Learn what post-hoc tests are, how Tukey, Bonferroni, and Scheffé methods compare, and when to use each for pairwise group comparisons.

What Are Post-Hoc Tests?

Post-hoc tests are follow-up pairwise comparison procedures used after an ANOVA reveals a significant overall difference among three or more group means. ANOVA tells you that at least one group differs from the others, but it doesn't tell you which specific groups are different. Post-hoc tests fill that gap by comparing every pair of groups while controlling the overall Type I error rate, the probability of finding at least one false positive across all comparisons. The term "post-hoc" (Latin for "after this") reflects the fact that these comparisons are conducted after the omnibus ANOVA test, not planned in advance. The three most common post-hoc procedures are Tukey's HSD, Bonferroni, and Scheffé, each with different strengths and appropriate use cases.

Why Post-Hoc Tests Matter

Without post-hoc correction, comparing four groups requires six pairwise tests, each at α = 0.05. The probability of at least one false positive jumps to about 26%. With ten groups, you'd make 45 comparisons and the false-positive risk exceeds 90%. Post-hoc tests keep this familywise error rate at your chosen alpha level, ensuring that when you report a significant difference between two groups, you can trust it.

How Post-Hoc Tests Work

Tukey's HSD (Honestly Significant Difference)

Tukey's HSD is the most popular post-hoc test for comparing all possible pairs of means. It uses the studentized range distribution to calculate a critical difference:

HSD = q × √(MS_within / n)

Where q is the studentized range critical value (from Tukey's table, based on the number of groups and error degrees of freedom), MS_within is the mean square error from the ANOVA, and n is the sample size per group.

Two means are significantly different if their absolute difference exceeds the HSD value.

Best for: Comparing all pairs when group sizes are equal or approximately equal. It's the default choice in most research.

Bonferroni Correction

The Bonferroni method divides the alpha level by the number of comparisons. If you're making 6 comparisons at α = 0.05:

Adjusted α = 0.05 / 6 = 0.0083

Each pairwise comparison (using a standard t-test) must meet this stricter threshold to be declared significant.

Best for: Situations where you're making a small number of planned or post-hoc comparisons. It's more conservative than Tukey when the number of comparisons is large.

Scheffé's Method

Scheffé's test is the most conservative of the three but also the most flexible. It controls the familywise error rate for all possible contrasts, not just pairwise comparisons, but also complex comparisons like "is the average of groups A and B different from group C?"

The critical value is based on the F-distribution:

Critical value = (k - 1) × F_critical

Where k is the number of groups and F_critical comes from the ANOVA F-distribution.

Best for: When you want to test complex contrasts (not just pairs) or when the comparison wasn't planned before data collection.

Worked Example

You compare satisfaction scores across four service levels (Basic, Standard, Premium, Enterprise) with 40 customers each. ANOVA is significant: F(3, 156) = 12.4, p < 0.001.

Group means: Basic = 62, Standard = 68, Premium = 74, Enterprise = 76. MS_within = 120.

Tukey's HSD results:

Comparison	Diff	Significant?
Basic vs. Standard	6	Yes (p = 0.018)
Basic vs. Premium	12	Yes (p < 0.001)
Basic vs. Enterprise	14	Yes (p < 0.001)
Standard vs. Premium	6	Yes (p = 0.018)
Standard vs. Enterprise	8	Yes (p = 0.002)
Premium vs. Enterprise	2	No (p = 0.741)

Premium and Enterprise don't differ significantly from each other, but both differ from Basic and Standard.

Comparison Table

Feature	Tukey HSD	Bonferroni	Scheffé
Type of comparisons	All pairwise	Selected or all pairwise	All possible contrasts
Conservatism	Moderate	Moderate to high	Most conservative
Equal group sizes required?	Ideal, but variants exist for unequal	No	No
Power for pairwise	Highest	Good with few comparisons	Lowest
Complex contrasts?	No	No	Yes

When to Use Post-Hoc Tests

After a significant one-way ANOVA to determine which specific groups differ
Concept testing with three or more concepts to identify which outperform and which underperform
Segment comparison when you've identified four or more customer segments and need to know which differ on key metrics
Experimental designs comparing multiple treatment conditions on a continuous outcome
Pricing tier analysis to determine which price points produce significantly different willingness-to-pay or purchase intent

Common Mistakes to Avoid

Running post-hoc tests without a significant omnibus ANOVA: the post-hoc procedure assumes the overall F-test was significant first
Choosing the most liberal test to get significant results: pick your test based on your design and assumptions, not the output
Forgetting to check equal-variance assumptions: Tukey and Bonferroni assume homogeneity of variance; switch to Games-Howell if variances differ substantially

How Quali-Fi Supports Post-Hoc Comparisons

Quali-Fi's platform automatically applies appropriate post-hoc tests when cross-tabulations or segment comparisons involve three or more groups, flagging significant pairwise differences directly in the results table. The Research plan ($1,061/month) lets you choose between Tukey, Bonferroni, and other correction methods based on your design.

See Quali-Fi's group comparison tools

Frequently Asked Questions

Should I use Tukey or Bonferroni?

Use Tukey when comparing all pairs of means, it's designed for that and has more statistical power. Use Bonferroni when you're only testing a few specific comparisons out of many possible pairs, or when you need a simple, widely understood correction method.

Can I use post-hoc tests with non-parametric analyses?

Yes. After a significant Kruskal-Wallis test, you'd use Dunn's test (with Bonferroni correction) for pairwise comparisons. After a significant Friedman test, you'd use Wilcoxon signed-rank tests with Bonferroni correction.

Do I need post-hoc tests if I only have two groups?

No. With only two groups, the ANOVA (or t-test) result directly tells you which group is higher. Post-hoc tests are only needed with three or more groups.

What Are Post-Hoc Tests?

Why Post-Hoc Tests Matter

How Post-Hoc Tests Work

Tukey's HSD (Honestly Significant Difference)

Bonferroni Correction

Scheffé's Method

Worked Example

Comparison Table

Other Post-Hoc Options

When to Use Post-Hoc Tests

Common Mistakes to Avoid

How Quali-Fi Supports Post-Hoc Comparisons

Frequently Asked Questions

Should I use Tukey or Bonferroni?

Can I use post-hoc tests with non-parametric analyses?

Do I need post-hoc tests if I only have two groups?

Frequently Asked Questions

Related Guides

Bonferroni Correction: Formula, Examples, and When to Use It

MANOVA: What It Is and When to Use It vs. Separate ANOVAs

ANCOVA: What It Is and How Covariate Adjustment Works

Kruskal-Wallis Test: Nonparametric One-Way Comparison

Alpha Level: Setting Significance Thresholds in Research

Ready to apply this in your research?

Post-Hoc Tests: Tukey, Bonferroni, and Scheffé Compared

What Are Post-Hoc Tests?

Why Post-Hoc Tests Matter

How Post-Hoc Tests Work

Tukey's HSD (Honestly Significant Difference)

Bonferroni Correction

Scheffé's Method

Worked Example

Comparison Table

Other Post-Hoc Options

When to Use Post-Hoc Tests

Common Mistakes to Avoid

How Quali-Fi Supports Post-Hoc Comparisons

Frequently Asked Questions

Should I use Tukey or Bonferroni?

Can I use post-hoc tests with non-parametric analyses?

Do I need post-hoc tests if I only have two groups?

Related Topics

Frequently Asked Questions

Related Guides

Bonferroni Correction: Formula, Examples, and When to Use It

MANOVA: What It Is and When to Use It vs. Separate ANOVAs

ANCOVA: What It Is and How Covariate Adjustment Works

Kruskal-Wallis Test: Nonparametric One-Way Comparison

Alpha Level: Setting Significance Thresholds in Research

Ready to apply this in your research?