What Is the Kruskal-Wallis Test?
The Kruskal-Wallis test is a nonparametric statistical test that compares three or more independent groups to determine whether their distributions differ. It's the nonparametric equivalent of one-way ANOVA, used when the dependent variable is ordinal or when the assumption of normality required by ANOVA isn't met. Like other rank-based tests, it works by ranking all observations from all groups together and then testing whether the average ranks differ significantly across groups. The test produces an H statistic that follows a chi-square distribution. In market research, the Kruskal-Wallis test is common when comparing customer segments, demographic groups, or experimental conditions on Likert-scale ratings or other ordinal outcomes where parametric assumptions are questionable.
Why the Kruskal-Wallis Test Matters
Comparing three or more groups is one of the most frequent analyses in research, segment comparisons, multi-cell experiments, regional breakdowns. When your data is ordinal, heavily skewed, or drawn from small groups, one-way ANOVA can produce misleading results. The Kruskal-Wallis test provides a valid alternative that doesn't require normality or equal variances, giving you trustworthy conclusions even with messy real-world data.
How the Kruskal-Wallis Test Works
The Procedure
- Combine all observations from all groups
- Rank them from lowest to highest (tied values get the average rank)
- Calculate the average rank for each group
- Compute the H statistic, which measures how much the group rank means deviate from the overall average rank
The Formula
H = [12 / (N(N + 1))] × Σ(R²_j / n_j) - 3(N + 1)
Where N is the total number of observations, k is the number of groups, n_j is the sample size for group j, and R_j is the sum of ranks in group j.
H is compared to the chi-square distribution with k - 1 degrees of freedom.
Worked Example
You compare satisfaction ratings (1-7 scale) across three customer service channels: phone (n = 6), chat (n = 6), and email (n = 6). N = 18.
| Phone | Chat | |
|---|---|---|
| 5 | 6 | 3 |
| 4 | 7 | 4 |
| 6 | 5 | 2 |
| 5 | 6 | 3 |
| 3 | 7 | 4 |
| 4 | 6 | 3 |
Ranked data (1-18):
All scores sorted: 2, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, wait, that's 17. Let me recount: Phone has {5,4,6,5,3,4} = 6 scores; Chat has {6,7,5,6,7,6} = 6; Email has {3,4,2,3,4,3} = 6. Total = 18.
Assigning average ranks:
| Score | Count | Ranks Occupied | Average Rank |
|---|---|---|---|
| 2 | 1 | 1 | 1 |
| 3 | 4 | 2-5 | 3.5 |
| 4 | 3 | 6-8 | 7 |
| 5 | 3 | 9-11 | 10 |
| 6 | 4 | 12-15 | 13.5 |
| 7 | 2 | 16-17... | 17 |
Actually with 18 scores: 2(×1), 3(×4), 4(×3), 5(×3), 6(×4), 7(×2) = 17. One more 7? Let me recount chat: 6,7,5,6,7,6. That's two 7s. Total 7s = 2, total 6s = 4, total 5s = 3, total 4s = 3, total 3s = 4, total 2s = 1 → sums to 17. The 18th score... Phone 3 gives us five 3s? No, phone has one 3, email has three 3s = four 3s total. We have 1+4+3+3+4+2 = 17. Recheck: email has {3,4,2,3,4,3}, three 3s, two 4s, one 2. Phone has {5,4,6,5,3,4}, two 5s, two 4s, one 6, one 3. Chat has {6,7,5,6,7,6}, three 6s, two 7s, one 5.
Totals: 2(×1), 3(×4), 4(×4), 5(×3), 6(×4), 7(×2) = 18. Right, four 4s, not three.
| Score | Count | Ranks | Average Rank |
|---|---|---|---|
| 2 | 1 | 1 | 1.0 |
| 3 | 4 | 2-5 | 3.5 |
| 4 | 4 | 6-9 | 7.5 |
| 5 | 3 | 10-12 | 11.0 |
| 6 | 4 | 13-16 | 14.5 |
| 7 | 2 | 17-18 | 17.5 |
R_phone = 11 + 7.5 + 14.5 + 11 + 3.5 + 7.5 = 55
R_chat = 14.5 + 17.5 + 11 + 14.5 + 17.5 + 14.5 = 89.5
R_email = 3.5 + 7.5 + 1 + 3.5 + 7.5 + 3.5 = 26.5
Check: 55 + 89.5 + 26.5 = 171 = 18(19)/2 = 171. Correct.
H = [12 / (18 × 19)] × [(55²/6) + (89.5²/6) + (26.5²/6)] - 3(19)
H = [12/342] × [504.2 + 1335.0 + 117.0] - 57
H = 0.0351 × 1956.2 - 57 = 68.6 - 57 = 11.6
With df = 2, the critical chi-square at α = 0.05 is 5.99. Since H = 11.6 > 5.99, the three channels produce significantly different satisfaction ratings (p < 0.01).
Follow-Up Tests
Use pairwise Mann-Whitney U tests with Bonferroni correction (or Dunn's test) to determine which groups differ. With 3 groups, that's 3 comparisons at adjusted α = 0.05/3 = 0.017.
When to Use the Kruskal-Wallis Test
- Comparing three or more independent groups on ordinal or non-normal continuous data
- Segment analysis with Likert-scale outcomes when groups are unequal in size or data is skewed
- Small group sizes where normality assumptions for ANOVA are questionable
- Survey data where response options are limited and distributions are lumpy
- Replacing one-way ANOVA when diagnostic checks reveal non-normal residuals or unequal variances
Common Mistakes to Avoid
- Using it for paired/repeated measures: for related groups, use the Friedman test instead
- Stopping at the omnibus test: a significant H statistic means at least one group differs, but you need post-hoc comparisons to identify which ones
- Assuming it tests medians: like the Mann-Whitney U, it tests distributional differences, which equals a median test only when group distributions have the same shape
How Quali-Fi Supports Nonparametric Group Comparisons
Quali-Fi's Research plan ($1,061/month) offers the Kruskal-Wallis test as a standard option for multi-group comparisons, with automated post-hoc testing and Bonferroni correction. The platform selects the appropriate test based on your data's characteristics and presents results in clear comparison tables.
Compare multiple groups with Quali-Fi
Frequently Asked Questions
How is the Kruskal-Wallis test different from one-way ANOVA?
One-way ANOVA compares group means and assumes normally distributed data with equal variances. The Kruskal-Wallis test compares rank distributions and makes no normality assumption. When ANOVA assumptions are met, ANOVA is more powerful. When they're not, Kruskal-Wallis is more reliable.
Can I use the Kruskal-Wallis test with only two groups?
Technically yes, and it gives the same result as the Mann-Whitney U test. But with only two groups, most researchers use the Mann-Whitney directly since no follow-up comparisons are needed.
What effect size should I report?
Epsilon-squared (ε²) = H / (N² - 1) / (N + 1) is one option. More commonly, eta-squared based on ranks (η²_H) = (H - k + 1) / (N - k) is reported. Values around 0.01, 0.06, and 0.14 correspond to small, medium, and large effects.