Linear Regression: What It Is and How to Use It in Research

Learn what linear regression is, how to interpret R-squared, and when to use simple linear models in market research and survey analysis.

What Is Linear Regression?

Linear regression is a statistical method that models the relationship between a continuous outcome variable and one or more predictor variables by fitting a straight line through the data. In its simplest form, simple linear regression, you have one predictor and one outcome, and the model finds the line that minimizes the total squared distance between each observed data point and the line's predicted value. It's one of the most widely used techniques in market research, applied to everything from predicting customer satisfaction scores based on service quality ratings to estimating sales volume from advertising spend. The output tells you both the direction and magnitude of the relationship: how much the outcome changes, on average, for each one-unit increase in the predictor.

Why Linear Regression Matters

Linear regression provides a clear, quantifiable answer to "how much does X influence Y?", the kind of question that drives most business decisions. Beyond simple correlation, it gives you a predictive equation you can use to forecast outcomes under different scenarios. It's also the foundation for nearly every advanced statistical technique in research, so understanding it well makes everything else easier to learn.

How Linear Regression Works

The Model

The simple linear regression equation is:

Y = b₀ + b₁X + ε

Where Y is the outcome (dependent variable), X is the predictor (independent variable), b₀ is the intercept (the predicted value of Y when X = 0), b₁ is the slope (the change in Y for each one-unit increase in X), and ε represents the error term, the variation in Y not explained by X.

Worked Example

You want to know if there's a relationship between the number of survey reminders sent (X) and response rate percentage (Y). You run 8 surveys with varying reminder counts:

Reminders (X)	Response Rate % (Y)
0	12
1	18
1	21
2	25
2	28
3	30
3	34
4	38

Running the regression produces: Y = 12.5 + 6.2X

Interpretation: With zero reminders, the predicted response rate is 12.5%. Each additional reminder is associated with a 6.2 percentage-point increase in response rate. If you sent 2 reminders, the predicted response rate is 12.5 + 6.2(2) = 24.9%.

R-Squared (R²)

R-squared tells you the proportion of variance in the outcome that's explained by the predictor(s). It ranges from 0 to 1.

R² = 1 - (SS_residual / SS_total)

Where SS_residual is the sum of squared differences between observed and predicted values, and SS_total is the sum of squared differences between observed values and the mean.

In the reminders example, R² = 0.96, meaning 96% of the variation in response rates is explained by the number of reminders. That's unusually high, in real-world market research, R² values between 0.20 and 0.50 are common and useful.

Assumptions

For the results to be trustworthy, linear regression requires:

Linearity: The relationship between X and Y is approximately straight. Check with a scatterplot.
Independence: Observations don't influence each other. This is violated when you have repeated measures from the same respondent.
Homoscedasticity: The spread of residuals is roughly constant across all values of X. Fan-shaped residual plots signal a violation.
Normality of residuals: The errors follow a roughly normal distribution. This matters most for small samples and for confidence intervals around predictions.
No influential outliers: A single extreme data point can dramatically shift the regression line.

Interpreting the Output

A typical regression output includes:

Coefficient (b₁): The slope, the predicted change in Y for a one-unit change in X
Standard error: How precisely the coefficient is estimated
t-statistic and p-value: Whether the coefficient is statistically different from zero
Confidence interval: The range of plausible values for the true coefficient
R²: Overall model explanatory power
F-statistic: Whether the model as a whole explains significant variance

When to Use Linear Regression

Predicting continuous outcomes like satisfaction scores, revenue, or response rates from one or more measurable predictors
Quantifying the effect size of a single factor, how much does each additional dollar of ad spend contribute to awareness?
Baseline modeling before adding complexity with multiple regression, interaction terms, or nonlinear transformations
Trend analysis to estimate how a metric changes over time when the trend appears roughly linear

Common Mistakes to Avoid

Extrapolating beyond your data range: a model trained on 0-4 reminders can't reliably predict what happens at 10 reminders
Assuming causation from regression alone: the model shows association; experimental design is what establishes causation
Ignoring assumption violations: running the model on clearly nonlinear data or data with extreme outliers produces misleading coefficients

How Quali-Fi Supports Linear Regression

Quali-Fi's Research plan ($1,061/month) includes regression analysis tools that generate coefficient tables, residual diagnostics, and R-squared summaries directly from survey data. The platform flags assumption violations automatically, so you know when a linear model fits well and when you need a different approach.

Try Quali-Fi's regression analysis tools

Frequently Asked Questions

What's a "good" R-squared value?

It depends entirely on your field. In physics, R² below 0.90 might be concerning. In market research and social science, R² between 0.20 and 0.50 is typical and can be highly actionable. A model explaining 30% of variation in purchase intent still provides valuable insight into which levers move the needle.

Can I use linear regression with categorical predictors?

Yes. Categorical predictors are included as dummy variables (0/1 coding). If you have a variable like "region" with four categories, you'd create three dummy variables. The coefficients represent the difference in the outcome relative to a reference category.

What's the difference between linear regression and correlation?

Correlation (r) measures the strength and direction of a linear relationship between two variables. Linear regression goes further, it provides a predictive equation, quantifies how much Y changes per unit change in X, and can be extended to include multiple predictors. Correlation is symmetric (r of X with Y equals r of Y with X); regression is directional.

What Is Linear Regression?

Why Linear Regression Matters

How Linear Regression Works

The Model

Worked Example

R-Squared (R²)

Assumptions

Interpreting the Output

When to Use Linear Regression

Common Mistakes to Avoid

How Quali-Fi Supports Linear Regression

Frequently Asked Questions

What's a "good" R-squared value?

Can I use linear regression with categorical predictors?

What's the difference between linear regression and correlation?

Frequently Asked Questions

Related Guides

Multiple Regression: What It Is and How to Add Predictors

Logistic Regression: What It Is and How to Interpret It

Multicollinearity: What It Is and How to Detect It

Hierarchical Regression: What It Is and How to Compare Models

ANCOVA: What It Is and How Covariate Adjustment Works

Ready to apply this in your research?

Linear Regression: What It Is and How to Use It in Research

What Is Linear Regression?

Why Linear Regression Matters

How Linear Regression Works

The Model

Worked Example

R-Squared (R²)

Assumptions

Interpreting the Output

When to Use Linear Regression

Common Mistakes to Avoid

How Quali-Fi Supports Linear Regression

Frequently Asked Questions

What's a "good" R-squared value?

Can I use linear regression with categorical predictors?

What's the difference between linear regression and correlation?

Related Topics

Frequently Asked Questions

Related Guides

Multiple Regression: What It Is and How to Add Predictors

Logistic Regression: What It Is and How to Interpret It

Multicollinearity: What It Is and How to Detect It

Hierarchical Regression: What It Is and How to Compare Models

ANCOVA: What It Is and How Covariate Adjustment Works

Ready to apply this in your research?