What Is the Friedman Test?
The Friedman test is a nonparametric statistical test used to detect differences across three or more related groups, typically repeated measures from the same participants. It's the nonparametric alternative to repeated-measures ANOVA, designed for situations where the dependent variable is ordinal or where the assumptions of normality required by parametric tests aren't met. Instead of comparing raw means, the Friedman test ranks the scores within each participant across conditions, then tests whether the average ranks differ significantly across conditions. It was developed by economist Milton Friedman (yes, that Milton Friedman) in 1937. In market research, you'd use it when the same respondents rate or rank multiple items, such as evaluating three product concepts in sequence, and the data is ordinal or heavily skewed.
Why the Friedman Test Matters
Repeated-measures designs are efficient because each participant serves as their own control, reducing variability. But repeated-measures ANOVA requires normally distributed data and interval-level measurement. When you're working with Likert-scale ratings, ranks, or satisfaction tiers, the Friedman test gives you a valid way to test for differences without forcing parametric assumptions onto non-parametric data.
How the Friedman Test Works
The Procedure
- For each participant, rank the scores across the k conditions (1 = lowest, k = highest). Tied ranks receive the average of the ranks they would have occupied.
- Sum the ranks for each condition across all participants.
- Calculate the Friedman test statistic.
The Formula
χ²_F = [12 / (nk(k + 1))] × ΣR²_j - 3n(k + 1)
Where n is the number of participants, k is the number of conditions, and R_j is the sum of ranks for condition j.
The test statistic follows a chi-square distribution with k - 1 degrees of freedom.
Worked Example
Eight customers taste-test three flavors of a beverage and rate each on a 1-7 scale:
| Customer | Flavor A | Flavor B | Flavor C |
|---|---|---|---|
| 1 | 5 | 3 | 6 |
| 2 | 4 | 4 | 7 |
| 3 | 6 | 2 | 5 |
| 4 | 3 | 5 | 6 |
| 5 | 5 | 3 | 7 |
| 6 | 4 | 4 | 5 |
| 7 | 6 | 1 | 7 |
| 8 | 5 | 3 | 6 |
Step 1. Rank within each customer:
| Customer | Rank A | Rank B | Rank C |
|---|---|---|---|
| 1 | 2 | 1 | 3 |
| 2 | 1.5 | 1.5 | 3 |
| 3 | 3 | 1 | 2 |
| 4 | 1 | 2 | 3 |
| 5 | 2 | 1 | 3 |
| 6 | 1.5 | 1.5 | 3 |
| 7 | 2 | 1 | 3 |
| 8 | 2 | 1 | 3 |
Step 2. Sum ranks per condition:
R_A = 15, R_B = 10, R_C = 23
Step 3. Calculate:
χ²_F = [12 / (8 × 3 × 4)] × (15² + 10² + 23²) - 3(8)(4)
χ²_F = [12 / 96] × (225 + 100 + 529) - 96
χ²_F = 0.125 × 854 - 96 = 106.75 - 96 = 10.75
With df = 2, the critical chi-square at α = 0.05 is 5.99. Since 10.75 > 5.99, we reject the null hypothesis. The three flavors produce significantly different ratings.
Follow-Up Tests
A significant Friedman test tells you that at least one condition differs, but not which ones. Use pairwise Wilcoxon signed-rank tests with Bonferroni correction to identify specific differences. With three conditions, you'd make 3 comparisons and use α = 0.05/3 = 0.017 for each.
Friedman Test vs. Repeated-Measures ANOVA
| Feature | Friedman Test | Repeated-Measures ANOVA |
|---|---|---|
| Data level | Ordinal or non-normal continuous | Interval/ratio, approximately normal |
| Uses | Ranks | Raw scores |
| Sphericity assumption | Not required | Required (or corrected) |
| Power with normal data | Lower | Higher |
| Sample size needs | Smaller okay | Larger preferred |
| Effect size | Kendall's W | Partial eta-squared |
If your data is continuous and reasonably normal, repeated-measures ANOVA is more powerful. If the data is ordinal, heavily skewed, or comes from small samples where normality is questionable, the Friedman test is the safer choice.
When to Use the Friedman Test
- Taste tests or concept evaluations where the same respondents rate multiple options on ordinal scales
- Before-during-after designs with three or more time points and non-normal data
- Ranking tasks where participants rank items rather than rating them on a continuous scale
- Small sample sizes where you can't confidently assume normality
- Likert-scale data when you're treating the scale as ordinal rather than interval
Common Mistakes to Avoid
- Using the Friedman test for independent groups: it's for related (repeated) measures only; use Kruskal-Wallis for independent groups
- Skipping post-hoc comparisons after a significant result, the omnibus test doesn't tell you where the differences are
- Ignoring ties: when many scores are tied, the basic formula needs a correction factor for ties; most statistical software handles this automatically
How Quali-Fi Supports Nonparametric Analysis
Quali-Fi's Research plan ($1,061/month) includes nonparametric testing options for repeated-measures designs, automatically selecting the appropriate test based on your data structure and measurement level. The platform handles tied ranks and provides follow-up pairwise comparisons with Bonferroni correction.
Analyze repeated-measures data with Quali-Fi
Frequently Asked Questions
How many participants do I need for the Friedman test?
With three conditions, a minimum of about 10-12 participants is often cited, though more is always better. For small samples (n < 10), you should use exact p-values rather than the chi-square approximation, which most statistical software can provide.
Can the Friedman test handle missing data?
Standard implementations require complete data, each participant must have scores for all conditions. If a participant is missing one condition, they're typically excluded entirely. Some software offers adjustments for missing data, but the safest approach is to address missingness before running the test.
What's the effect size for the Friedman test?
Kendall's W (coefficient of concordance) is the standard effect size. It ranges from 0 (no agreement in rankings) to 1 (perfect agreement). W = χ²_F / (n(k-1)). Values around 0.1 are small, 0.3 medium, and 0.5+ large.