Survey Design

MaxDiff Question Type: Survey Implementation Guide

6 min read

Learn how MaxDiff questions work in surveys, how to implement best-worst scaling tasks, and best practices for item count, set design, and respondent experience.

What Is a MaxDiff Question Type?

A MaxDiff question type, short for Maximum Difference Scaling, also called best-worst scaling, is a survey question format that presents respondents with subsets of items (typically 4-5 at a time) and asks them to select the best (most important, most appealing, most preferred) and worst (least important, least appealing, least preferred) item in each set. Across multiple sets, every item appears multiple times in different combinations, and the pattern of best/worst selections produces a ratio-scale ranking of all items from most to least preferred. MaxDiff forces trade-offs that rating scales don't, producing cleaner discrimination between items and eliminating the scale-use biases (like acquiescence and extreme response tendency) that plague Likert-style questions.

Why MaxDiff Questions Matter

When you ask respondents to rate 15 features on a 1-5 importance scale, most features end up rated 4 or 5. Respondents don't discriminate because the question format doesn't require them to. MaxDiff solves this by making every selection a relative judgment, choosing one item as best implicitly means it's more important than the others in that set. The result is a clear rank order with meaningful distance between items, not a cluster of high scores that tells you nothing about priorities. For product managers, marketers, and strategists who need to know what matters most, MaxDiff delivers the answer that rating scales obscure.

How MaxDiff Questions Work

The Respondent Task

Each MaxDiff question (called a "set" or "task") shows 4-5 items from the total list. The respondent selects one item as the best (or most important/appealing, depending on the study framing) and one as the worst (or least important/appealing). That's it, two clicks per task.

For example, in a feature prioritization study with 15 features:

Set 1: Which feature is MOST important and which is LEAST important to you?

  • Real-time collaboration
  • Offline access
  • Custom reporting
  • Mobile app
  • API integrations

The respondent picks one as most important and one as least important. They then see another set with a different combination of features. After completing all sets, the analysis produces a utility score for each feature based on how often it was chosen as best versus worst across all sets.

Experimental Design

The design determines which items appear together in each set and how many sets each respondent sees. Good designs ensure:

Balanced frequency: Each item appears the same number of times across all sets. If "offline access" appears in 8 sets but "API integrations" appears in only 4, the estimates won't be comparable.

Balanced pairing: Each pair of items appears together roughly the same number of times. This prevents confounding, if two items always appear together, you can't separate their individual effects.

Sufficient repetition: Each item must appear enough times for stable estimation. The standard is each item appearing 3-5 times per respondent.

Most MaxDiff software generates optimal designs automatically. For a study with 15 items shown 4 per set, a typical design has 10-12 sets per respondent, with each item appearing about 3 times.

Item Count and Set Size

Total items: MaxDiff works well with 8-30 items. Below 8, there aren't enough items to create meaningful trade-offs (just use a simple ranking question). Above 30, the number of sets needed for balanced estimation makes the exercise too long for respondents.

Items per set: 4-5 items per set is standard. Three items per set is possible but produces less information per task. Six or more items per set increases cognitive load significantly.

Number of sets: Typically 8-15 sets per respondent. The formula is roughly: (number of items x 3) / items per set. For 20 items shown 5 per set, that's 12 sets. Each set takes about 15-20 seconds, so a 12-set MaxDiff exercise runs about 3-4 minutes.

Analysis Methods

Counting analysis is the simplest approach. Calculate the percentage of times each item was chosen as best minus the percentage chosen as worst. This produces a score from -100 to +100. It's easy to compute and explain but doesn't produce individual-level estimates.

Hierarchical Bayes (HB) estimation is the standard for rigorous MaxDiff analysis. HB produces individual-level utility scores for each respondent, enabling segmentation, cross-tabulation, and individual-level importance profiles. It handles unbalanced designs and missing data gracefully.

Anchored MaxDiff adds a follow-up question after the MaxDiff exercise asking respondents to indicate which items meet some threshold (e.g., "which of these would you actually use?"). This converts the relative MaxDiff scores into an absolute scale, distinguishing between "most important of a list of important things" and "most important of a list of unimportant things."

Writing Effective Items

MaxDiff items should be:

Concise. Respondents scan sets quickly. Items longer than 8-10 words slow down the task and increase cognitive load. "Real-time collaboration tools" works. "The ability to collaborate with team members in real time on shared documents" is too long.

At a similar level of abstraction. Mixing specific features ("dark mode") with broad capabilities ("improved performance") creates apples-to-oranges comparisons that produce unreliable results.

Independently meaningful. Each item should make sense without context from other items. Avoid items that are subsets of other items in the list.

Mutually exclusive. Items shouldn't overlap in meaning. If "fast delivery" and "same-day shipping" are both in the list, respondents don't know how to distinguish them.

When to Use MaxDiff Questions

  • Feature prioritization to determine which capabilities matter most to users when building product roadmaps
  • Message testing to rank marketing claims, value propositions, or taglines by appeal
  • Brand attribute ranking to identify which associations are strongest and most differentiating
  • Benefits prioritization to understand which product or service benefits drive purchase consideration
  • Menu or assortment optimization to determine which items or options are most and least preferred

Common Mistakes to Avoid

  • Including too many similar items that respondents can't distinguish, if three items all describe variations of "ease of use," collapse them into one item or make the distinctions crystal clear
  • Using inconsistent item lengths where some items are 3 words and others are 15, respondents default to selecting shorter items as best because they're easier to process, biasing results
  • Framing the task ambiguously: "best" and "worst" need clear definitions in context; "most important to your purchase decision" is specific while "best" alone is open to interpretation

How Quali-Fi Supports MaxDiff Questions

Quali-Fi's Research and Intelligence plans include MaxDiff with automated balanced experimental design, Hierarchical Bayes estimation, and individual-level utility scoring. The platform generates optimized set rotations, handles the analysis automatically, and produces respondent-level scores that can be cross-tabulated with demographics for segment-specific prioritization.

Frequently Asked Questions

How many respondents do I need for MaxDiff?

For aggregate counting analysis, 100-150 respondents produce stable results. For Hierarchical Bayes individual-level estimation, 200+ respondents is recommended. If you're segmenting results by subgroup, aim for 150-200 per segment.

Can MaxDiff work on mobile?

Yes, and it works quite well. The two-tap task (select best, select worst) is simple on touchscreens. Keep items short so they display without wrapping, and ensure the "best" and "worst" selection areas have adequate touch target size. MaxDiff is actually one of the more mobile-friendly advanced question types.

What's the difference between MaxDiff and ranking?

Direct ranking asks respondents to order all items from first to last, which becomes unreliable beyond 5-7 items because people can't hold that many comparisons in mind simultaneously. MaxDiff breaks the ranking into manageable subsets (4-5 items at a time) and uses statistical modeling to reconstruct the full rank order, producing more reliable results with longer lists.


Need to know what matters most? Start a free trial of Quali-Fi Research and use MaxDiff to force trade-offs and get a clear rank order of features, messages, or benefits.

Frequently Asked Questions

Related Guides

Put it into practice

Ready to apply this in your research?

Quali-Fi makes it easy to run surveys, conjoint studies, and more, all in one platform.