What Is Logistic Regression?
Logistic regression is a statistical method for modeling binary outcomes, situations where the dependent variable has exactly two categories, such as purchased/didn't purchase, churned/retained, or clicked/didn't click. Unlike linear regression, which predicts continuous values, logistic regression estimates the probability that an observation falls into one of the two categories. It does this by applying a logistic (sigmoid) function to a linear combination of predictors, transforming the output into a value between 0 and 1. In market research, logistic regression is the go-to technique for understanding which survey responses, demographics, or behavioral variables predict whether a customer will take a specific action. It's also the backbone of many lead-scoring and churn-prediction models.
Why Logistic Regression Matters
Binary decisions drive most business outcomes, buy or don't buy, subscribe or cancel, recommend or stay silent. Logistic regression gives you a principled way to quantify which factors increase or decrease the probability of that binary outcome. It also produces odds ratios, which are among the most intuitive effect-size measures for communicating findings to non-technical stakeholders: "customers who reported high satisfaction had 3.2 times the odds of repurchasing."
How Logistic Regression Works
The Logistic Function
The model takes the form:
P(Y = 1) = 1 / (1 + e^(-(b₀ + b₁X₁ + b₂X₂ +... + bₖXₖ)))
Where P(Y = 1) is the probability of the outcome occurring, b₀ is the intercept, and b₁ through bₖ are the coefficients for each predictor. The exponential function ensures the predicted probability always falls between 0 and 1, regardless of the predictor values.
Log-Odds and Odds Ratios
The raw coefficients are in log-odds (logit) units, which aren't directly intuitive. To make them interpretable, you exponentiate them:
Odds Ratio = e^(b)
An odds ratio of 1.0 means no effect. Greater than 1.0 means higher odds of the outcome. Less than 1.0 means lower odds.
Worked Example
You survey 500 customers and model repurchase (yes/no) using satisfaction score (1-10) and loyalty program membership (yes/no).
Results:
| Predictor | Coefficient (b) | Odds Ratio (e^b) | p-value |
|---|---|---|---|
| Intercept | -4.20 | , | <0.001 |
| Satisfaction | 0.45 | 1.57 | <0.001 |
| Loyalty member | 0.82 | 2.27 | 0.003 |
Interpretation: Each one-point increase in satisfaction multiplies the odds of repurchase by 1.57 (a 57% increase in odds). Loyalty program members have 2.27 times the odds of repurchasing compared to non-members, holding satisfaction constant.
To predict a specific probability: for a loyalty member with a satisfaction score of 7:
logit = -4.20 + (0.45 × 7) + (0.82 × 1) = -4.20 + 3.15 + 0.82 = -0.23
P(repurchase) = 1 / (1 + e^0.23) = 1 / 1.26 = 0.44, or 44%
Model Fit
Logistic regression doesn't use R-squared in the traditional sense. Instead, you'll see:
- Nagelkerke R² or McFadden's R²: Pseudo-R-squared measures that approximate explained variance. Values above 0.20 are generally considered acceptable in social science research.
- Classification accuracy: The percentage of cases correctly classified. Compare this against the base rate, if 70% of customers repurchase, your model needs to beat 70% to be useful.
- AUC (Area Under the ROC Curve): Ranges from 0.5 (chance) to 1.0 (perfect). Values above 0.70 indicate acceptable discrimination; above 0.80 is good.
Assumptions
Logistic regression requires:
- The outcome is binary (or can be meaningfully collapsed into two categories)
- Observations are independent
- No severe multicollinearity among predictors
- A linear relationship between continuous predictors and the log-odds of the outcome
- A large enough sample, the common guideline is at least 10 events per predictor variable
When to Use Logistic Regression
- Churn modeling to identify which customer attributes predict cancellation
- Purchase prediction from survey responses like intent, satisfaction, and NPS
- Segmentation validation to test whether segment membership predicts a specific behavior
- A/B test analysis when the outcome is binary (converted/didn't convert) and you need to control for covariates
- Lead scoring to rank prospects by their probability of conversion
Common Mistakes to Avoid
- Interpreting coefficients as probabilities: the raw coefficients are in log-odds units; you need to exponentiate them for odds ratios or apply the full logistic function for predicted probabilities
- Ignoring base rates: a model that predicts "no churn" for everyone achieves 95% accuracy if only 5% of customers churn, but it's completely useless
- Treating odds ratios as relative risk: an odds ratio of 2.0 doesn't mean "twice as likely" unless the outcome is rare (below ~10% prevalence)
How Quali-Fi Supports Logistic Regression
Quali-Fi's Research plan ($1,061/month) includes binary logistic regression modeling with automatic odds ratio calculation and classification tables built into the analytics dashboard. For multi-category outcomes or complex predictive models, the Intelligence tier provides custom modeling with expert interpretation and actionable recommendations.
See Quali-Fi's advanced analytics capabilities
Frequently Asked Questions
What's the difference between logistic regression and linear regression?
Linear regression predicts a continuous outcome (revenue, satisfaction score, time spent). Logistic regression predicts the probability of a binary outcome (yes/no, buy/don't buy). Using linear regression for binary outcomes can produce predicted values below 0 or above 1, which don't make sense as probabilities.
Can logistic regression handle more than two outcome categories?
Yes, but you'd use multinomial logistic regression (for unordered categories like brand preference) or ordinal logistic regression (for ordered categories like low/medium/high satisfaction). Standard binary logistic regression is limited to two-category outcomes.
How many observations do I need for logistic regression?
The most cited guideline is at least 10 events (occurrences of the less frequent outcome) per predictor variable. If you have 5 predictors and only 30% of your sample experienced the event, you'd need at least 50 events, which means a minimum sample of about 167. More conservative recommendations suggest 20 events per predictor.