If you’re staring at a problem with no idea which test it wants — t-test, chi-square, ANOVA, something else entirely — you are not bad at statistics. You’re missing one step almost nobody teaches directly: how to read the question and let it tell you which test to use.
I taught this subject every semester for forty years, and I’ll tell you what I told every class: the formulas were almost never where students got stuck. The sticking point was this exact decision — knowing which tool to reach for. So let’s fix that. Not by memorizing a chart you’ll forget by Friday, but by learning the three questions you can ask about any problem.
Question 1 — What kind of data do you have?
Everything starts here, and getting it wrong sends you down the wrong path immediately.
- Quantitative data is numbers you measure: heights, test scores, reaction times, dollars. The kind of thing you can average.
- Categorical datais groups or labels you count: pass/fail, blood type, which brand someone chose, yes/no. You can’t average “blue” — you can only count how many landed in each bucket.
If you find yourself wanting to take a mean, you have quantitative data. If you find yourself counting how many fall into each category, it’s categorical. That single distinction rules out half the tests before you’ve done anything else.
Question 2 — How many groups are you comparing?
- One group, compared against some known or claimed value.
- Two groups, compared against each other.
- Three or more groups, compared all at once.
Question 3 — What are you actually asking?
Usually it’s one of three things: a question about an average, a question about a proportion or a count, or a question about a relationship between two things.
That’s it. Now let’s put the three answers together.
The decision guide
If your data is quantitative (you’re asking about averages):
- One group, comparing its mean to a known value — one-sample t-test (e.g. “Is the average fill of these bottles really 500 ml?”)
- Two independent groups, comparing their means — two-sample (independent) t-test (e.g. “Did Class A and Class B score differently?”)
- Two measurements on the same subjects — paired t-test (e.g. “Did each student improve from the pretest to the posttest?”)
- Three or more groups, comparing their means — ANOVA (e.g. “Do these four diets lead to different average weight loss?”)
- A relationship between two quantitative variables — correlation / linear regression (e.g. “As study hours go up, do exam scores go up?”)
If your data is categorical (you’re asking about counts or proportions):
- One categorical variable, comparing observed counts to what you’d expect — chi-square goodness-of-fit test (e.g. “Are these dice rolls evenly distributed?”)
- Two categorical variables, testing whether they’re related — chi-square test of independence (e.g. “Is voting preference associated with age group?”)
- One proportion, compared to a claimed value — one-proportion z-test
- Two proportions, compared to each other — two-proportion z-test
You’ll notice means lean on the t-tests and proportions lean on the z-tests. That’s not arbitrary — it comes from what each kind of data lets you assume — but for picking the test, the pattern is enough to get you to the right place.
One question always comes up: with three or more groups, why not just compare them two at a time? Because every test you run carries its own small chance of a false alarm, and those chances pile up — run six pairwise comparisons at the usual 5% and you’re closer to a 26% chance of being fooled somewhere. That’s the whole reason ANOVAexists: it asks “are any of these groups different?” in one test, instead of letting the errors stack up.
And what about the p-value?
Once you’ve picked the test and run it, you get a p-value, and it trips up just as many students as the choice itself. In plain English, the p-value answers one question: if there were really no effect at all, how surprising would my data be? A small p-value means “this result would be strange if nothing were going on” — so you have evidence something isgoing on. A large p-value means “this could easily happen by chance,” so you don’t.
The usual cutoff is 0.05, but the number itself isn’t magic, and a p-value never tells you the probability that your hypothesis is true — only how well the data fits the assumption of no effect. And keep the players straight while you read it: the real effect lives in the whole population (a parameter), while your sample only ever hands you an estimate of it (a statistic). The mnemonic sticks: samples have statistics, populations have parameters.
If you want the intuition behind why we’re so cautious with that 0.05 cutoff, borrow it from a courtroom. A jury starts from “not guilty” and only convicts when the evidence is strong — and notice they return “not guilty,” never “innocent.” Hypothesis testing works the same way: we assume no effect until the data is surprising enough to force our hand, and when it isn’t, we say we “failed to reject” that assumption, not that we proved it. We’d rather miss a real effect than announce a false one — the same instinct that would rather let a guilty person walk than convict an innocent one. Picking the right test and reading its p-value with that humility are two halves of the same skill.
Let me show you how I’d think through one
Suppose a problem says: a teacher tries a new method with one class and the old method with another, then compares the two classes’ final exam scores.
I don’t reach for a formula. I ask the three questions.
What kind of data?Exam scores — numbers I can average. Quantitative. So I’m somewhere in the t-test / ANOVA family.
How many groups? Two: the new-method class and the old-method class.
Independent or paired?They’re different students, not the same students measured twice. Independent.
Two independent groups, comparing averages — two-sample t-test.I knew the answer before I wrote a single symbol, because the question told me. That’s the whole move.
The mistake I see most often
Students reach for the formula first and the question second. They’ve memorized that a t-test has a certain formula, so the moment they see two numbers they start plugging in — without ever asking whether the data was categorical, or whether those two groups were actually the same people measured twice.
The paired-versus-independent slip is the classic one. “Before and after” on the same students is paired. Two separate groups is independent. Same numbers on the page, completely different test — and the only way to tell them apart is to read the situation, not the formula.
This is why I always say: math is not a spectator sport, and statistics is the part where memorizing fails fastest. Once you’re asking the three questions instead of pattern-matching to a formula, choosing the test stops being the scary part of the course.
Still feels like a wall?
That’s normal, and it’s usually a sign of a small gap from a few weeks back — something quiet that’s making everything since feel impossible. Finding that gap is the first thing I do with every student, and I’ll never make you feel foolish for having one. Mistakes are how this subject is learned.