Effect Size, Power, and Sample Size in Psychology: How They Fit Together
Students often meet effect size, power, and sample size as three separate annoyances, usually while a deadline is already looking unpleasant. They make more sense when treated as parts of the same problem. Change one, and the others move with it.
Why these three keep showing up together
Effect size, power, and sample size are tied together because they all shape how likely your study is to detect the pattern you care about.
If you expect a large effect, you usually need fewer participants to detect it. If you expect a small effect, you usually need more. If you want higher power, you usually need more participants again. If you use a stricter alpha level, the sample often has to grow as well.
So when students ask, “How many participants do I need?”, the annoying but honest answer is usually, “That depends on what effect you expect, how much certainty you want, and how strict you plan to be.”
What effect size actually means
Effect size is about magnitude. It tells you how large a difference or relationship is, not just whether something might be statistically significant.
For t-tests, the effect size is often expressed as Cohen’s d. Rough conventions are commonly described like this:
d = 0.20 as small
d = 0.50 as medium
d = 0.80 as large
These are only rough guides. They are not sacred numbers handed down from the statistical mountain. In some areas of psychology, a “small” effect may still matter a great deal. In others, a supposedly “medium” effect may be wildly optimistic.
That optimism matters, because overestimating the effect size is one of the easiest ways to end up with an underpowered study and a deeply irritating result.
What statistical power means
Power is the probability that your study will detect an effect if that effect is really there.
A common target is .80, which means an 80% chance of detecting the effect under the assumptions you specified. Some researchers aim for .90, especially when they want stricter planning. Higher power is attractive, but it costs sample size.
That is the trade-off in plain form. More certainty usually means more people, more time, and more admin. Statistics has never been especially sentimental about this.
What sample size is doing in all this
Sample size is the practical lever. It is the part you can usually change most easily, at least in principle.
If you hold alpha and power constant, the required sample size gets bigger as the expected effect gets smaller. This is why small effects are expensive. They are harder to detect cleanly, so the study needs more data to separate signal from noise.
That is also why many student projects quietly lean toward unrealistically large expected effects. Smaller samples look friendlier on paper. Reality tends to be less cooperative.
How they fit together
The relationship is easier to grasp if you think of it like this:
A small expected effect means you need more evidence to detect it.
More evidence usually means a larger sample.
If you also want high power, the sample grows again.
If you tighten alpha, the sample grows again.
So the broad pattern is not mysterious:
smaller effect sizes push sample size up
higher power pushes sample size up
stricter alpha pushes sample size up
This is why one fixed “correct sample size” almost never exists in the abstract. It depends on the assumptions you are making.
A simple example
Imagine you are planning an independent-samples t-test.
If you expect a medium effect, say d = 0.50, with power set at .80 and alpha at .05, the required sample will be much smaller than if you expect a small effect, say d = 0.20, with those same settings.
Nothing magical happened there. The expected effect just got harder to detect, so the study had to become larger to compensate.
What students often get wrong
One common mistake is treating effect size conventions as if they automatically fit every topic. They do not.
Another is entering whatever effect size produces a manageable sample, then calling that “planning.” That is less planning and more statistical wishful thinking in a lab coat.
A third is forgetting that paired designs and independent designs are not interchangeable. Paired designs can sometimes need fewer observations because they use related measurements, but only when the design genuinely supports that structure.
It is also worth saying that post hoc power talk is often less helpful than students hope. Power is most useful during planning, before the data are collected, not as a decorative flourish added after the fact.
So what should you do in practice?
Start with the design you are actually using. Do not choose the design that gives the nicest sample size unless you have also discovered a way to choose reality.
Then use the best effect size estimate you can justify. That might come from prior studies, a meta-analysis, a pilot, or a reasoned convention if nothing better exists.
After that, choose your target power and alpha deliberately rather than just accepting whatever number wanders into the box.
Then look at the required sample and ask the awkward but useful question: is this study realistically doable?
If the answer is no, that is still useful information. It may mean you need a narrower question, a stronger design, a different outcome measure, or a more honest sense of what the project can actually support.
Use the visualizer below
The helper below lets you see how required sample size changes when effect size, power, and alpha shift. It is built for t-test planning and simple intuition-building, not for every design ever invented.
Free Helper
Effect Size, Power, and Sample Size Visualizer
Use this helper to see how required sample size changes when effect size, power, and alpha shift. It is built for straightforward t-test intuition and planning rather than every possible design.
Set your assumptions
Choose a t-test design, set your alpha and power, then pick the effect size you want to highlight.
This marks the current scenario on the chart and in the summary cards below.
Your visual summary
The cards show the highlighted scenario. The chart shows how the required sample changes across a range rather than pretending your chosen number arrived by divine revelation.
| Effect size | Required sample |
|---|---|
| Small (d = 0.20) | — |
| Medium (d = 0.50) | — |
| Large (d = 0.80) | — |
This free helper is for t-test intuition and basic planning. Premium is where it makes more sense to put broader designs and fuller guidance.
What Premium does better
- Free: visual intuition for t-test planning
- Premium: broader support for more designs and more detailed guidance
- Premium: better once your study stops looking like the clean little example from the methods textbook
Simply Put
Effect size, power, and sample size are not three separate boxes to fill in because the software asked nicely. They are part of the same planning logic.
Small effects require bigger studies. Higher power usually requires bigger studies. Stricter thresholds usually require bigger studies. Once that clicks, sample size planning stops looking like random bureaucracy and starts looking like arithmetic with consequences.
References
Field, A. P. (2018). Discovering statistics using IBM SPSS Statistics (5th ed.). Sage.
Lakens, D. (2022). Sample size justification. Collabra: Psychology, 8(1), Article 33267. https://doi.org/10.1525/collabra.33267