Correlation vs causation
“Correlation is not causation” is the most-quoted line in statistics — and the most-ignored. A correlation means two things move together; causation means one makes the other change. The gap between them is where bad conclusions live, because a correlation is consistent with several explanations and can’t tell you which one is true.
Why a correlation can’t prove cause
If A and B are correlated, at least four things could be going on:
- A causes B — what you hoped.
- B causes A — reverse causation (you have the arrow backwards).
- A third variable C causes both — a confounder.
- Coincidence — pure chance, especially with small samples or many comparisons.
A correlation coefficient can’t distinguish these. That’s the whole problem.
The two traps to name out loud
- Confounding — a third variable explains the link. Always ask: what else could be driving both?
- Reverse causation — does happiness cause exercise, or exercise cause happiness? Without a time order, you can’t say.
What it actually takes to claim causation
The gold standard is a randomised controlled experiment: random assignment severs the link between the treatment and any confounders, so a difference in outcomes can be pinned on the treatment. When you can’t randomise, you build a causal case more carefully — longitudinal timing, statistically controlling for confounders, and weight-of-evidence frameworks (e.g. the Bradford Hill considerations). The bar is far higher than “they correlate.”
Write it honestly
If your study is correlational, your discussion should report an association, name the plausible confounders, and resist the slide into causal language (“X leads to Y”). Reviewers watch for exactly this overreach — and writing it precisely is a mark of a careful researcher.
Get the free Statistics toolkit
A reporting checklist, worked examples, and test-selection flowcharts from the Statistical Test Selection Workbook. We’ll email you the download link.
Frequently asked questions
Why doesn't correlation imply causation?
Because A↔B could be A→B, B→A, a shared cause (confounder), or coincidence — a correlation can’t tell which.
What is a confounding variable?
A third variable that drives both, creating a correlation with no direct link — like hot weather behind ice cream and drownings.
What does it take to claim causation?
Ideally a randomised experiment; otherwise timing, controlling for confounders, and weight-of-evidence frameworks.
Can a correlational study suggest cause?
It can support a causal hypothesis, but on its own report an association, not a cause.