Mean vs Median, and Reading Data Without Being Fooled

Descriptive statistics are the numbers that summarize a set of data: a measure of its center (mean, median, mode) and a measure of its spread (range, standard deviation, and others). They’re the first thing you learn in a stats course, and they look easy — averages, after all. But the real skill isn’t computing them. It’s knowing which summary tells the truth about a particular set of data, because the wrong one will quietly mislead you.

So rather than march through definitions you can find anywhere, let me focus on the part that actually matters: choosing the honest summary, and not getting fooled by the dishonest one.

Measures of center: where’s the middle?

There are three, and they answer “what’s typical?” in slightly different ways:

Mean — the ordinary average: add everything up, divide by how many. It uses every value, which is its strength and its weakness.
Median — the middle value when you line the data up in order. Half the data is below it, half above.
Mode— the most common value. Most useful for categorical data (“the most popular choice”). The name comes from pie à la mode— “mode” just means popular.

When to use the mean vs the median

This is the question students ask most, and there’s a clean answer: it depends on whether your data is skewed or has outliers.

The mean uses every value, so a few extreme values drag it toward them. The median doesn’t care how far away the extremes are — only how many values sit on each side — so it holds steady.

Roughly symmetric data, no wild outliers — the mean is a fine, informative summary.
Skewed data, or a few extreme values — the median is the honest choice.

Here’s the example I always use. You’re applying for a job at a small company. The boss makes $120K a year, and each of the four employees makes $20K. The boss tells you the averagepay is $40K and you’re thrilled — until you take the job and find you’re making half that. The mean got hauled upward by one big salary. The medianpay is $20K, and that’s the number that would actually have told you what to expect. A statistic that barely budges when you toss in an extreme value like that is called a resistantstatistic — and the median is far more resistant than the mean. That’s exactly why you see median, not mean, home prices reported: a few mansions would make the mean frighteningly large. This is how a true number tells a false story — the mean isn’t wrong, it’s just the wrong tool for skewed data.

Measures of spread: how spread out is it?

Center alone never tells the whole story — two data sets can share the same average and look nothing alike. That’s what spread captures:

Range — biggest value minus smallest. Simple, but one outlier can blow it up.
Standard deviation— roughly, the typical distance of a value from the mean. Small standard deviation means the data is bunched tight; large means it’s scattered.
Interquartile range (IQR) — the spread of the middle 50% of the data. Like the median, it shrugs off outliers, which makes it the spread measure that pairs naturally with the median.

A good habit: report the mean with the standard deviation, or the median with the IQR. Center and spread together; pick the pair that matches your data.

The mistake I see most often

Reaching for the mean by reflex on data that’s clearly skewed. Someone reports an “average” home price, an “average” wait time, an “average” salary — and because a handful of huge values are pulling it, the average describes almost nobody in the data. The fix is a habit, not a formula: before you trust an average, ask whether the data is lopsided or has extreme values. If it is, the median is telling you the truer story. Being able to spot that is most of what “reading data without being fooled” actually means — and it’s a skill that long outlasts the course.

My favorite case of the mean overstepping is the GPA. Grades are really at the ordinal level — an A beats a B and a B beats a C, but is the gap the samesize each time? Nobody can really say. Averaging them to the nearest thousandth makes the grading process look far more precise than it is. That’s the mean handing you an illusion of precision — and that illusion is worth being suspicious of wherever you meet it.

Want a clearer footing?

Descriptive statistics is where the whole course begins, so a shaky start here quietly makes everything afterward harder. If the summaries aren’t sitting right, that’s worth fixing early rather than late — so the foundation is solid before the harder material lands on top of it.

So rather than march through definitions you can find anywhere, let me focus on the part that actually matters: choosing the honest summary, and not getting fooled by the dishonest one.

Measures of center: where’s the middle?

There are three, and they answer “what’s typical?” in slightly different ways:

Mean — the ordinary average: add everything up, divide by how many. It uses every value, which is its strength and its weakness.
Median — the middle value when you line the data up in order. Half the data is below it, half above.
Mode— the most common value. Most useful for categorical data (“the most popular choice”). The name comes from pie à la mode— “mode” just means popular.

When to use the mean vs the median

This is the question students ask most, and there’s a clean answer: it depends on whether your data is skewed or has outliers.

The mean uses every value, so a few extreme values drag it toward them. The median doesn’t care how far away the extremes are — only how many values sit on each side — so it holds steady.

Roughly symmetric data, no wild outliers — the mean is a fine, informative summary.
Skewed data, or a few extreme values — the median is the honest choice.

Measures of spread: how spread out is it?

Center alone never tells the whole story — two data sets can share the same average and look nothing alike. That’s what spread captures:

Range — biggest value minus smallest. Simple, but one outlier can blow it up.
Standard deviation— roughly, the typical distance of a value from the mean. Small standard deviation means the data is bunched tight; large means it’s scattered.
Interquartile range (IQR) — the spread of the middle 50% of the data. Like the median, it shrugs off outliers, which makes it the spread measure that pairs naturally with the median.

A good habit: report the mean with the standard deviation, or the median with the IQR. Center and spread together; pick the pair that matches your data.

Descriptive statistics — reading data without being fooled

Measures of center: where’s the middle?

When to use the mean vs the median

Measures of spread: how spread out is it?

The mistake I see most often

Want a clearer footing?

Descriptive statistics — reading data without being fooled

Measures of center: where’s the middle?

When to use the mean vs the median

Measures of spread: how spread out is it?

The mistake I see most often

Want a clearer footing?