Statistical Concepts

Pearson Correlation: What It Is, Formula, and How to Interpret It

6 min read

Learn what Pearson correlation is, how to calculate and interpret the coefficient, and what assumptions your data needs to meet.

What Is Pearson Correlation?

The Pearson correlation coefficient (r) measures the strength and direction of the linear relationship between two continuous variables. It tells you how well a straight line describes the association between two measures. The coefficient ranges from -1.0 (perfect negative linear relationship) to +1.0 (perfect positive linear relationship), with 0 indicating no linear association. For example, a Pearson r of 0.82 between advertising spend and website traffic means there's a strong positive linear relationship, as spend increases, traffic increases in a predictable, proportional pattern. Pearson correlation is the most widely used measure of association in quantitative research.

Why Pearson Correlation Matters

Understanding relationships between variables is fundamental to research and business decisions. Pearson correlation quantifies these relationships with a single interpretable number, making it easy to compare the strength of different associations. It's the basis for regression analysis, structural equation modeling, and many other advanced techniques. In market research, it helps answer questions like "how strongly does price sensitivity relate to brand loyalty?" or "does customer effort score predict churn?"

How Pearson Correlation Works

The Formula

r = Σ(Xᵢ - X̄)(Yᵢ - Ȳ) / √[Σ(Xᵢ - X̄)² × Σ(Yᵢ - Ȳ)²]

Or equivalently:

r = Cov(X, Y) / (SD_X × SD_Y)

Where:

  • Cov(X, Y) = the covariance of X and Y
  • SD_X and SD_Y = the standard deviations of X and Y

The formula essentially standardizes the covariance by dividing it by the product of the two standard deviations, which bounds the result between -1 and +1.

Worked Example

You collected data from 6 stores on monthly ad spend (thousands) and monthly revenue (thousands):

Store Ad Spend (X) Revenue (Y)
1 2 30
2 4 45
3 6 55
4 8 70
5 10 80
6 12 95

X̄ = 7, Ȳ = 62.5

Computing the numerator (sum of cross-products): Σ(Xᵢ - X̄)(Yᵢ - Ȳ) = 225

Computing the denominator: √[Σ(Xᵢ - X̄)² × Σ(Yᵢ - Ȳ)²] = √[70 × 2,737.5] = √[191,625] ≈ 227.5 (Note: using approximate values for illustration.)

Wait, let me compute this cleanly:

Σ(Xᵢ - X̄)² = 70 Σ(Yᵢ - Ȳ)² = 2,737.5

r = 225 / √(70 × 2,737.5) = 225 / √191,625 ≈ 225 / 437.75 ≈ 0.514

Hmm, with this nearly perfect linear data the actual r is much higher. Let me recompute: the cross-products are (-5)(-32.5) + (-3)(-17.5) + (-1)(-7.5) + (1)(7.5) + (3)(17.5) + (5)(32.5) = 162.5 + 52.5 + 7.5 + 7.5 + 52.5 + 162.5 = 445.

r = 445 / √(70 × 2,737.5) = 445 / 437.6 ≈ 0.998

A Pearson r of 0.998 confirms what the data shows visually: an almost perfect linear relationship between ad spend and revenue in this dataset.

Interpreting Pearson r

r Value Strength
0.00 - 0.19 Negligible
0.20 - 0.39 Weak
0.40 - 0.59 Moderate
0.60 - 0.79 Strong
0.80 - 1.00 Very strong

(the coefficient of determination) tells you the proportion of variance shared between the two variables. An r of 0.70 gives r² = 0.49, meaning 49% of the variation in one variable is associated with variation in the other. The remaining 51% is explained by other factors.

Assumptions

Pearson correlation requires four assumptions:

  1. Continuous data: both variables should be measured on interval or ratio scales
  2. Linear relationship: the association should be approximately linear (check with a scatterplot)
  3. Bivariate normality: the joint distribution of the two variables should be approximately normal
  4. No extreme outliers: outliers can dramatically inflate or deflate r

When these assumptions are violated, Spearman correlation is typically a better choice.

Correlation Is Not Causation

This is the most important caveat. A strong Pearson correlation between two variables doesn't mean one causes the other. The relationship could be driven by a third variable (confound), could be coincidental, or could even have the causal direction reversed. Only experimental designs with random assignment can establish causation. In observational research (which includes most survey studies), correlation is the ceiling, you can identify associations but not causal relationships.

When to Use Pearson Correlation

  • Exploring relationships between continuous survey variables before running regression
  • Validating scales by checking that items correlate with the total score (item-total correlation)
  • Quick association checks in initial data exploration
  • Building correlation matrices as inputs for factor analysis or structural equation modeling
  • Comparing effect sizes across different variable pairs in a standardized way

Common Mistakes to Avoid

  • Ignoring non-linear relationships: if the relationship is curved (e.g., satisfaction increases with product features up to a point, then decreases due to complexity), Pearson r will underestimate or miss the association entirely; always plot your data first
  • Interpreting small but significant correlations as meaningful: with large samples (n > 1,000), even r = 0.06 can be statistically significant; focus on r² to evaluate practical significance
  • Assuming causation from correlation: strong correlation is necessary but not sufficient for causal claims; confounding variables are always a possibility in observational data

How Quali-Fi Supports Correlation Analysis

Quali-Fi's cross-tabulation tools automatically calculate Pearson correlation for continuous measures and display correlation matrices with significance flags. The platform generates scatterplots for visual inspection and highlights relationships that exceed user-defined strength thresholds, making it easy to identify the strongest drivers in your data.

Discover relationships in your data with Quali-Fi

Frequently Asked Questions

When should I use Pearson vs. Spearman correlation?

Use Pearson when both variables are continuous, approximately normally distributed, and the relationship is linear. Use Spearman when data is ordinal (like Likert scales), when the relationship is monotonic but non-linear, or when outliers are a concern. If you're unsure, run both, if they give similar results, the choice doesn't matter much.

What does a Pearson correlation of 0 mean?

It means there's no linear relationship between the two variables. However, there could still be a non-linear relationship. Two variables with r = 0 might have a perfect U-shaped or circular relationship that Pearson can't detect. Always check scatterplots alongside correlation coefficients.

Can I use Pearson correlation with Likert scale data?

This is debated. Technically, Likert scales are ordinal and don't meet Pearson's assumption of continuous data. In practice, researchers routinely calculate Pearson correlations with 5-point and 7-point Likert scales, and simulation studies show the results are generally reliable when scales have 5 or more points. For scales with fewer points, Spearman is safer.

Frequently Asked Questions

Related Guides

Put it into practice

Ready to apply this in your research?

Quali-Fi makes it easy to run surveys, conjoint studies, and more, all in one platform.