Data Collection & Analysis

MaxDiff Data Analysis: Applied Walkthrough

6 min read

Learn how to analyze MaxDiff data, interpret preference scores, and apply results to product and marketing decisions with a step-by-step walkthrough.

What Is MaxDiff Data Analysis?

MaxDiff data analysis is the process of converting raw best-worst choice data from a Maximum Difference Scaling exercise into preference scores that rank items by relative importance, appeal, or priority. In each MaxDiff task, respondents select the best and worst item from a subset, and the analysis transforms those selections into a ratio-scaled score for every item tested. The most common estimation methods are hierarchical Bayesian (HB) analysis for individual-level scores and aggregate logit for sample-level estimates. Unlike Likert-scale data where everything can rate "high," MaxDiff forces discrimination between items, producing a clear rank order with meaningful distances between items.

Why MaxDiff Data Analysis Matters

Raw MaxDiff choice data (which item was selected as best, which as worst, in each task) isn't directly interpretable. You need the analysis step to produce the preference scores that drive decisions. Done correctly, the output is a ratio scale where an item scoring 20 is twice as preferred as an item scoring 10. This level of measurement precision lets you make confident prioritization calls: feature A genuinely matters twice as much as feature B, not just "more." Mishandling the analysis, say by counting raw best-minus-worst tallies instead of running proper estimation, produces ordinal rankings without meaningful distances and sacrifices the method's key advantage.

How MaxDiff Data Analysis Works

From Choices to Utility Scores

Each respondent's series of best-worst choices feeds into a statistical model (typically multinomial logit or HB) that estimates a utility score for every item. HB estimation is preferred because it produces individual-level scores, allowing you to examine how preferences vary across segments. The model estimates the probability that each item would be chosen as "best" or "worst" given the other items in the set, and backs out latent utility values that are most consistent with the observed choices.

Rescaling for Interpretation

Raw HB utilities are on an arbitrary scale. Most practitioners rescale them for easier interpretation. The two common approaches are probability scaling (rescaling so scores sum to 100% and represent the probability of each item being chosen as most important from the full set) and zero-anchored interval scaling (centering at zero, where positive items are above-average preference and negative items are below-average). Probability scaling is more intuitive for stakeholders because "Feature A has a 15% share of preference" communicates immediately. Choose your scaling method before reporting and stick with it.

Analyzing Aggregate Results

Start with the full-sample rank order. Plot items from highest to lowest score and look for natural breakpoints. In a 15-item feature prioritization study, you might see a clear top tier (items 1-4 with scores above 10%), a middle tier (items 5-9 between 5-10%), and a bottom tier (items 10-15 below 5%). The gaps between tiers are often more important than the exact rank within a tier. A 0.3% difference between items ranked 2nd and 3rd likely isn't meaningful, but a 4% gap between the 4th and 5th items indicates a genuine preference boundary.

Segmenting the Results

Because HB produces individual-level scores, you can cut the data by any respondent attribute. Compare preference scores across demographics, usage segments, or customer tiers. If your top feature overall is "24/7 support" but it ranks 8th among enterprise customers (who already have dedicated account managers), your product roadmap priorities differ by segment. Run t-tests or ANOVA on the individual-level scores to test whether segment differences are statistically significant rather than relying on visual inspection of means.

A Worked Example

A B2B software company tested 12 potential product features using MaxDiff with 450 decision-makers. HB estimation produced individual-level scores, rescaled to probability format (summing to 100%). The top 3 features were: custom reporting (12.4%), API integrations (11.8%), and single sign-on (10.1%). The bottom 3 were: dark mode (3.2%), gamification elements (2.8%), and social sharing (2.1%).

Segment analysis revealed that companies with 50+ employees valued API integrations at 15.2% (vs. 8.1% for smaller companies), while companies under 50 employees prioritized ease-of-setup at 13.4% (vs. 6.7% for larger companies). The product team created two onboarding paths based on company size and deprioritized the bottom-tier features entirely, saving an estimated 400 development hours.

Combining MaxDiff with Other Data

MaxDiff scores can serve as predictor variables in regression models. If you've also collected satisfaction scores, NPS, or purchase intent, regressing those outcomes on individual-level MaxDiff utilities shows which preference items actually predict business outcomes. An item might rank high in stated preference but show no relationship to satisfaction, suggesting respondents think they want it but it doesn't actually move the needle.

When to Use MaxDiff Data Analysis

  • Feature prioritization ranking 8-30 potential features, messages, or benefits by relative importance to guide product or marketing strategy
  • Message testing determining which value propositions resonate most strongly relative to alternatives
  • Brand attribute ranking identifying which attributes differentiate your brand most in consumers' minds
  • Employee priority assessment ranking workplace improvements, benefits, or policy changes by employee preference
  • Innovation pipeline triage prioritizing which product ideas to pursue based on relative consumer appeal

Common Mistakes

  • Using raw best-minus-worst counts instead of proper HB or logit estimation produces ordinal rankings without meaningful score distances, losing the ratio-scale advantage that makes MaxDiff superior to simple ranking
  • Testing items at different levels of abstraction (mixing specific features like "dark mode" with broad benefits like "better user experience") produces misleading comparisons because abstract items tend to score higher
  • Ignoring individual-level variation and reporting only aggregate scores when preference distributions within items are bimodal, indicating distinct preference segments that average out misleadingly

How Quali-Fi Supports MaxDiff Data Analysis

Quali-Fi's Research plan includes MaxDiff as a built-in question type with automated HB estimation and preference score calculation. The platform displays ranked results with confidence intervals, supports segment-level comparisons, and exports individual-level utility scores for advanced analysis. You can run a complete MaxDiff study from design through analysis without leaving the platform.

Frequently Asked Questions

How many items can I test in a MaxDiff study?

MaxDiff works well for 8-30 items. Below 8, a simple ranking question may suffice. Above 30, respondent burden increases without proportional gains in precision. For very large item sets (40+), consider splitting into blocks or using an anchored MaxDiff design that allows comparison across blocks.

What's the difference between MaxDiff and ranking questions?

Ranking forces respondents to order all items at once, which becomes cognitively difficult beyond 7-8 items. MaxDiff breaks the task into manageable subsets (typically 4-5 items per task) and produces ratio-scaled scores with known statistical properties. Rankings produce only ordinal data with no information about the magnitude of preference differences.

Can MaxDiff tell me if respondents like or dislike an item in absolute terms?

Standard MaxDiff only measures relative preference. An item scoring highest might still be something respondents dislike if all options are unappealing. Anchored MaxDiff adds a reference point (e.g., "none of these is important") that provides an absolute threshold, allowing you to distinguish "best of a bad set" from "genuinely important."


Run MaxDiff studies with built-in analysis -- try Quali-Fi free for 14 days.

Frequently Asked Questions

Related Guides

Put it into practice

Ready to apply this in your research?

Quali-Fi makes it easy to run surveys, conjoint studies, and more, all in one platform.