What Are Post-Hoc Tests?
Post-hoc tests are follow-up pairwise comparison procedures used after an ANOVA reveals a significant overall difference among three or more group means. ANOVA tells you that at least one group differs from the others, but it doesn't tell you which specific groups are different. Post-hoc tests fill that gap by comparing every pair of groups while controlling the overall Type I error rate, the probability of finding at least one false positive across all comparisons. The term "post-hoc" (Latin for "after this") reflects the fact that these comparisons are conducted after the omnibus ANOVA test, not planned in advance. The three most common post-hoc procedures are Tukey's HSD, Bonferroni, and Scheffé, each with different strengths and appropriate use cases.
Why Post-Hoc Tests Matter
Without post-hoc correction, comparing four groups requires six pairwise tests, each at α = 0.05. The probability of at least one false positive jumps to about 26%. With ten groups, you'd make 45 comparisons and the false-positive risk exceeds 90%. Post-hoc tests keep this familywise error rate at your chosen alpha level, ensuring that when you report a significant difference between two groups, you can trust it.
How Post-Hoc Tests Work
Tukey's HSD (Honestly Significant Difference)
Tukey's HSD is the most popular post-hoc test for comparing all possible pairs of means. It uses the studentized range distribution to calculate a critical difference:
HSD = q × √(MS_within / n)
Where q is the studentized range critical value (from Tukey's table, based on the number of groups and error degrees of freedom), MS_within is the mean square error from the ANOVA, and n is the sample size per group.
Two means are significantly different if their absolute difference exceeds the HSD value.
Best for: Comparing all pairs when group sizes are equal or approximately equal. It's the default choice in most research.
Bonferroni Correction
The Bonferroni method divides the alpha level by the number of comparisons. If you're making 6 comparisons at α = 0.05:
Adjusted α = 0.05 / 6 = 0.0083
Each pairwise comparison (using a standard t-test) must meet this stricter threshold to be declared significant.
Best for: Situations where you're making a small number of planned or post-hoc comparisons. It's more conservative than Tukey when the number of comparisons is large.
Scheffé's Method
Scheffé's test is the most conservative of the three but also the most flexible. It controls the familywise error rate for all possible contrasts, not just pairwise comparisons, but also complex comparisons like "is the average of groups A and B different from group C?"
The critical value is based on the F-distribution:
Critical value = (k - 1) × F_critical
Where k is the number of groups and F_critical comes from the ANOVA F-distribution.
Best for: When you want to test complex contrasts (not just pairs) or when the comparison wasn't planned before data collection.
Worked Example
You compare satisfaction scores across four service levels (Basic, Standard, Premium, Enterprise) with 40 customers each. ANOVA is significant: F(3, 156) = 12.4, p < 0.001.
Group means: Basic = 62, Standard = 68, Premium = 74, Enterprise = 76. MS_within = 120.
Tukey's HSD results:
| Comparison | Diff | Significant? |
|---|---|---|
| Basic vs. Standard | 6 | Yes (p = 0.018) |
| Basic vs. Premium | 12 | Yes (p < 0.001) |
| Basic vs. Enterprise | 14 | Yes (p < 0.001) |
| Standard vs. Premium | 6 | Yes (p = 0.018) |
| Standard vs. Enterprise | 8 | Yes (p = 0.002) |
| Premium vs. Enterprise | 2 | No (p = 0.741) |
Premium and Enterprise don't differ significantly from each other, but both differ from Basic and Standard.
Comparison Table
| Feature | Tukey HSD | Bonferroni | Scheffé |
|---|---|---|---|
| Type of comparisons | All pairwise | Selected or all pairwise | All possible contrasts |
| Conservatism | Moderate | Moderate to high | Most conservative |
| Equal group sizes required? | Ideal, but variants exist for unequal | No | No |
| Power for pairwise | Highest | Good with few comparisons | Lowest |
| Complex contrasts? | No | No | Yes |
Other Post-Hoc Options
- Games-Howell: Use when group variances are unequal (doesn't assume homogeneity of variance)
- Dunnett's test: Use when comparing multiple treatment groups against a single control group
- Fisher's LSD: The least conservative, only appropriate after a significant omnibus F with exactly three groups
When to Use Post-Hoc Tests
- After a significant one-way ANOVA to determine which specific groups differ
- Concept testing with three or more concepts to identify which outperform and which underperform
- Segment comparison when you've identified four or more customer segments and need to know which differ on key metrics
- Experimental designs comparing multiple treatment conditions on a continuous outcome
- Pricing tier analysis to determine which price points produce significantly different willingness-to-pay or purchase intent
Common Mistakes to Avoid
- Running post-hoc tests without a significant omnibus ANOVA: the post-hoc procedure assumes the overall F-test was significant first
- Choosing the most liberal test to get significant results: pick your test based on your design and assumptions, not the output
- Forgetting to check equal-variance assumptions: Tukey and Bonferroni assume homogeneity of variance; switch to Games-Howell if variances differ substantially
How Quali-Fi Supports Post-Hoc Comparisons
Quali-Fi's platform automatically applies appropriate post-hoc tests when cross-tabulations or segment comparisons involve three or more groups, flagging significant pairwise differences directly in the results table. The Research plan ($1,061/month) lets you choose between Tukey, Bonferroni, and other correction methods based on your design.
See Quali-Fi's group comparison tools
Frequently Asked Questions
Should I use Tukey or Bonferroni?
Use Tukey when comparing all pairs of means, it's designed for that and has more statistical power. Use Bonferroni when you're only testing a few specific comparisons out of many possible pairs, or when you need a simple, widely understood correction method.
Can I use post-hoc tests with non-parametric analyses?
Yes. After a significant Kruskal-Wallis test, you'd use Dunn's test (with Bonferroni correction) for pairwise comparisons. After a significant Friedman test, you'd use Wilcoxon signed-rank tests with Bonferroni correction.
Do I need post-hoc tests if I only have two groups?
No. With only two groups, the ANOVA (or t-test) result directly tells you which group is higher. Post-hoc tests are only needed with three or more groups.