Logistic Regression: What It Is and How to Interpret It

Q: What's the difference between logistic regression and linear regression?

Linear regression predicts a continuous outcome (revenue, satisfaction score, time spent). Logistic regression predicts the probability of a binary outcome (yes/no, buy/don't buy). Using linear regression for binary outcomes can produce predicted values below 0 or above 1, which don't make sense as probabilities.

Q: Can logistic regression handle more than two outcome categories?

Yes, but you'd use multinomial logistic regression (for unordered categories like brand preference) or ordinal logistic regression (for ordered categories like low/medium/high satisfaction). Standard binary logistic regression is limited to two-category outcomes.

Q: How many observations do I need for logistic regression?

The most cited guideline is at least 10 events (occurrences of the less frequent outcome) per predictor variable. If you have 5 predictors and only 30% of your sample experienced the event, you'd need at least 50 events, which means a minimum sample of about 167. More conservative recommendations suggest 20 events per predictor.

Learn what logistic regression is, how odds ratios work, and when to use binary outcome models in market research and survey analysis.

What Is Logistic Regression?

Logistic regression is a statistical method for modeling binary outcomes, situations where the dependent variable has exactly two categories, such as purchased/didn't purchase, churned/retained, or clicked/didn't click. Unlike linear regression, which predicts continuous values, logistic regression estimates the probability that an observation falls into one of the two categories. It does this by applying a logistic (sigmoid) function to a linear combination of predictors, transforming the output into a value between 0 and 1. In market research, logistic regression is the go-to technique for understanding which survey responses, demographics, or behavioral variables predict whether a customer will take a specific action. It's also the backbone of many lead-scoring and churn-prediction models.

Why Logistic Regression Matters

Binary decisions drive most business outcomes, buy or don't buy, subscribe or cancel, recommend or stay silent. Logistic regression gives you a principled way to quantify which factors increase or decrease the probability of that binary outcome. It also produces odds ratios, which are among the most intuitive effect-size measures for communicating findings to non-technical stakeholders: "customers who reported high satisfaction had 3.2 times the odds of repurchasing."

How Logistic Regression Works

The Logistic Function

The model takes the form:

P(Y = 1) = 1 / (1 + e^(-(b₀ + b₁X₁ + b₂X₂ +... + bₖXₖ)))

Where P(Y = 1) is the probability of the outcome occurring, b₀ is the intercept, and b₁ through bₖ are the coefficients for each predictor. The exponential function ensures the predicted probability always falls between 0 and 1, regardless of the predictor values.

Log-Odds and Odds Ratios

The raw coefficients are in log-odds (logit) units, which aren't directly intuitive. To make them interpretable, you exponentiate them:

Odds Ratio = e^(b)

An odds ratio of 1.0 means no effect. Greater than 1.0 means higher odds of the outcome. Less than 1.0 means lower odds.

Worked Example

You survey 500 customers and model repurchase (yes/no) using satisfaction score (1-10) and loyalty program membership (yes/no).

Results:

Predictor	Coefficient (b)	Odds Ratio (e^b)	p-value
Intercept	-4.20	,	<0.001
Satisfaction	0.45	1.57	<0.001
Loyalty member	0.82	2.27	0.003

Interpretation: Each one-point increase in satisfaction multiplies the odds of repurchase by 1.57 (a 57% increase in odds). Loyalty program members have 2.27 times the odds of repurchasing compared to non-members, holding satisfaction constant.

To predict a specific probability: for a loyalty member with a satisfaction score of 7:

logit = -4.20 + (0.45 × 7) + (0.82 × 1) = -4.20 + 3.15 + 0.82 = -0.23

P(repurchase) = 1 / (1 + e^0.23) = 1 / 1.26 = 0.44, or 44%

Model Fit

Logistic regression doesn't use R-squared in the traditional sense. Instead, you'll see:

Nagelkerke R² or McFadden's R²: Pseudo-R-squared measures that approximate explained variance. Values above 0.20 are generally considered acceptable in social science research.
Classification accuracy: The percentage of cases correctly classified. Compare this against the base rate, if 70% of customers repurchase, your model needs to beat 70% to be useful.
AUC (Area Under the ROC Curve): Ranges from 0.5 (chance) to 1.0 (perfect). Values above 0.70 indicate acceptable discrimination; above 0.80 is good.

Assumptions

Logistic regression requires:

The outcome is binary (or can be meaningfully collapsed into two categories)
Observations are independent
No severe multicollinearity among predictors
A linear relationship between continuous predictors and the log-odds of the outcome
A large enough sample, the common guideline is at least 10 events per predictor variable

When to Use Logistic Regression

Churn modeling to identify which customer attributes predict cancellation
Purchase prediction from survey responses like intent, satisfaction, and NPS
Segmentation validation to test whether segment membership predicts a specific behavior
A/B test analysis when the outcome is binary (converted/didn't convert) and you need to control for covariates
Lead scoring to rank prospects by their probability of conversion

Common Mistakes to Avoid

Interpreting coefficients as probabilities: the raw coefficients are in log-odds units; you need to exponentiate them for odds ratios or apply the full logistic function for predicted probabilities
Ignoring base rates: a model that predicts "no churn" for everyone achieves 95% accuracy if only 5% of customers churn, but it's completely useless
Treating odds ratios as relative risk: an odds ratio of 2.0 doesn't mean "twice as likely" unless the outcome is rare (below ~10% prevalence)

How Quali-Fi Supports Logistic Regression

Quali-Fi's Research plan ($1,061/month) includes binary logistic regression modeling with automatic odds ratio calculation and classification tables built into the analytics dashboard. For multi-category outcomes or complex predictive models, the Intelligence tier provides custom modeling with expert interpretation and actionable recommendations.

See Quali-Fi's advanced analytics capabilities

Frequently Asked Questions

What's the difference between logistic regression and linear regression?

Linear regression predicts a continuous outcome (revenue, satisfaction score, time spent). Logistic regression predicts the probability of a binary outcome (yes/no, buy/don't buy). Using linear regression for binary outcomes can produce predicted values below 0 or above 1, which don't make sense as probabilities.

Can logistic regression handle more than two outcome categories?

Yes, but you'd use multinomial logistic regression (for unordered categories like brand preference) or ordinal logistic regression (for ordered categories like low/medium/high satisfaction). Standard binary logistic regression is limited to two-category outcomes.

How many observations do I need for logistic regression?

The most cited guideline is at least 10 events (occurrences of the less frequent outcome) per predictor variable. If you have 5 predictors and only 30% of your sample experienced the event, you'd need at least 50 events, which means a minimum sample of about 167. More conservative recommendations suggest 20 events per predictor.

What Is Logistic Regression?

Why Logistic Regression Matters

How Logistic Regression Works

The Logistic Function

Log-Odds and Odds Ratios

Worked Example

Model Fit

Assumptions

When to Use Logistic Regression

Common Mistakes to Avoid

How Quali-Fi Supports Logistic Regression

Frequently Asked Questions

What's the difference between logistic regression and linear regression?

Can logistic regression handle more than two outcome categories?

How many observations do I need for logistic regression?

Frequently Asked Questions

Related Guides

Linear Regression: What It Is and How to Use It in Research

Odds Ratio: Calculation, Interpretation, and Research Applications

Multicollinearity: What It Is and How to Detect It

Multiple Regression: What It Is and How to Add Predictors

Relative Risk: Calculation, Interpretation, and vs. Odds Ratio

Ready to apply this in your research?

Logistic Regression: What It Is and How to Interpret It

What Is Logistic Regression?

Why Logistic Regression Matters

How Logistic Regression Works

The Logistic Function

Log-Odds and Odds Ratios

Worked Example

Model Fit

Assumptions

When to Use Logistic Regression

Common Mistakes to Avoid

How Quali-Fi Supports Logistic Regression

Frequently Asked Questions

What's the difference between logistic regression and linear regression?

Can logistic regression handle more than two outcome categories?

How many observations do I need for logistic regression?

Related Topics

Frequently Asked Questions

Related Guides

Linear Regression: What It Is and How to Use It in Research

Odds Ratio: Calculation, Interpretation, and Research Applications

Multicollinearity: What It Is and How to Detect It

Multiple Regression: What It Is and How to Add Predictors

Relative Risk: Calculation, Interpretation, and vs. Odds Ratio

Ready to apply this in your research?