What Is Factor Analysis?
Factor analysis is a statistical method that reduces a large number of observed variables into a smaller set of underlying dimensions called factors. It works on the premise that groups of correlated variables share a common latent construct that isn't directly measured. For example, if survey respondents who rate a brand highly on "reliable" also tend to rate it highly on "trustworthy" and "dependable," factor analysis identifies these three items as indicators of a single underlying factor you might label "Brand Trust." Instead of analyzing 25 individual survey questions, you end up working with 4-6 interpretable factors that explain most of the variation in responses.
Why Factor Analysis Matters
Survey instruments often include dozens of items, and analyzing them individually creates noise and redundancy. Factor analysis solves this by revealing the structure underneath your data, which questions are really measuring the same thing, and how many distinct constructs your survey actually captures. It's also essential for scale development and validation: if you're building a customer satisfaction instrument, factor analysis confirms whether your items group into the dimensions you intended.
How Factor Analysis Works
Exploratory Factor Analysis (EFA)
EFA is used when you don't have a strong hypothesis about the underlying structure. You let the data reveal how variables cluster together. The process involves:
- Calculate a correlation matrix: measure how every variable relates to every other variable
- Extract factors: identify the underlying dimensions that account for the correlations (common methods: principal axis factoring, maximum likelihood)
- Determine the number of factors: use eigenvalues > 1 (Kaiser criterion), the scree plot, or parallel analysis
- Rotate the solution: make factors easier to interpret (common rotations: varimax for uncorrelated factors, oblimin for correlated factors)
- Interpret and label factors: examine which variables load on each factor and assign meaningful names
Confirmatory Factor Analysis (CFA)
CFA tests whether a pre-specified factor structure fits the data. You hypothesize which items belong to which factors and then evaluate model fit. CFA is used when you're validating an existing instrument or confirming a structure found through EFA in a new sample.
Key fit indices for CFA:
- CFI (Comparative Fit Index): > 0.90 acceptable, > 0.95 good
- RMSEA (Root Mean Square Error of Approximation): < 0.08 acceptable, < 0.06 good
- SRMR (Standardized Root Mean Residual): < 0.08 good
EFA vs. CFA
| EFA | CFA | |
|---|---|---|
| Purpose | Discover structure | Confirm structure |
| When to use | New instruments, exploratory research | Validating scales, testing theory |
| Hypothesis | None, data-driven | Pre-specified model |
| Software | SPSS, R, Python | AMOS, Mplus, R (lavaan) |
| Output | Factor loadings, eigenvalues | Fit indices, path coefficients |
Worked Example
You surveyed 500 consumers on 12 brand perception attributes for a fast-food chain. Running EFA reveals three factors:
Factor 1, "Quality" (eigenvalue = 4.2, 35% of variance)
- Fresh ingredients: loading = 0.82
- Taste quality: loading = 0.79
- Food presentation: loading = 0.71
- Menu variety: loading = 0.65
Factor 2, "Convenience" (eigenvalue = 2.1, 17.5% of variance)
- Speed of service: loading = 0.85
- Location accessibility: loading = 0.78
- Mobile ordering: loading = 0.72
Factor 3, "Value" (eigenvalue = 1.5, 12.5% of variance)
- Price fairness: loading = 0.81
- Portion size: loading = 0.74
- Deals and promotions: loading = 0.69
These three factors together explain 65% of the total variance. Instead of comparing brands on 12 individual attributes, you can now compare them on three meaningful dimensions. Items with loadings below 0.40 on all factors would be candidates for removal.
Key Decisions in Factor Analysis
How many factors to retain? The Kaiser criterion (eigenvalues > 1) often over-extracts. Parallel analysis is more reliable, it compares your eigenvalues to those from random data of the same size. Retain only factors with eigenvalues exceeding the random baseline.
Which rotation? Use varimax if you believe factors are independent. Use oblimin or promax if you expect factors to correlate with each other (which is common in real data, "quality" and "value" perceptions often correlate).
What's a good loading? Generally, 0.40 or higher is the minimum threshold. Loadings above 0.70 are strong. Items that load above 0.40 on two or more factors (cross-loadings) are problematic and may need to be removed or rewritten.
When to Use Factor Analysis
- Scale development to confirm that your survey items measure the constructs you intend
- Data reduction to collapse many variables into a manageable number of composite scores for further analysis
- Brand perception mapping to identify the key dimensions along which consumers evaluate brands
- Segmentation preprocessing: factor scores often serve as inputs for cluster analysis
- Questionnaire refinement to identify redundant or poorly performing items
Common Mistakes to Avoid
- Running factor analysis with too few observations: aim for at least 5-10 respondents per variable, with a minimum of 200 total; below this, factor solutions are unstable
- Labeling factors based on one or two items: a factor defined by fewer than three items is weak and may not replicate; strong factors have 3+ items with loadings above 0.50
- Using principal component analysis interchangeably with factor analysis: PCA extracts components that maximize total variance, while factor analysis models shared variance; they answer different questions and can produce different results
How Quali-Fi Supports Factor Analysis
Quali-Fi's Intelligence tier includes an exploratory factor analysis module that calculates eigenvalues, generates scree plots, and produces rotated factor loading tables directly from your survey data. The platform flags cross-loadings and low-loading items, making it easy to refine your instrument without switching to external statistical software.
Run factor analysis in Quali-Fi
Frequently Asked Questions
How many variables do I need for factor analysis?
You generally need at least 3 variables per expected factor, and a total of at least 10-12 variables. More importantly, you need an adequate sample size, a common guideline is a minimum of 200 observations, or 5-10 observations per variable, whichever is larger.
Can I run factor analysis on ordinal data (like Likert scales)?
Technically, factor analysis assumes continuous data. In practice, Likert scales with 5 or more points are routinely analyzed with factor analysis and results are generally reliable. For scales with fewer categories (binary or 3-point), consider polychoric correlations instead of Pearson correlations as inputs.
What's the difference between factor analysis and principal component analysis?
Factor analysis models the shared variance among variables to identify latent constructs. PCA models total variance to create composites that maximize variance explained. PCA always explains more total variance, but factor analysis is better for understanding underlying constructs. If your goal is theoretical (what's driving these responses?), use factor analysis. If your goal is data reduction (give me fewer variables), PCA may suffice.