Written by Darrell Huff and illustrated by Irving Geis. Originally published in 1954.

In a sentence: a quick, worthwhile general knowledge read that explains how people misuse statistics through familiar and useful examples (that could use an update).

I’ll face up to the serious purpose that I like to think lurks just beneath the surface of this book: explaining how to look a phony statistic in the eye and face it down; and no less important, how to recognize sound and usable data in that wilderness of fraud…

How to Lie with Statistics, p. 124, Darrell Huff.

Most of the examples revolve around a) consumer misinterpretation, b) deliberate misrepresentation, c) failures in random selection, d) confusing correlation with causation, or e) data collection and reporting issues. Each chapter is largely a long list of examples from society illustrating a common theme of how a statistic is misused, along with a summary. The book is highly readable and contains very little math. It tries to teach intuition; at first sight of a statistic, what are the questions you should ask about it to see if it is trustworthy?

Illustrations are well-used. These are either custom designed or reproduced data visualizations that illustrate common errors of perception. The examples are usually drawn from newspapers or magazines of the time; if not, they are just humorous. At their core, the tricks used to create misleading visualizations still are relied on today: adjusting axes, misjudging the size of shapes, different sampling periods, etc. While visualizations may be more complex or visually appealing nowadays, I don’t think there are many new ways we’ve developed to purposefully confuse people. The old ways still work. I would say we’ve just gotten better about categorizing and identifying why and how people get confused when they see certain types of visualizations.

Obviously, some of the examples are outdated, but the issues that they demonstrate are lasting. For example:

Some malaria figures mean as little. Where before 1940 there were hundreds of thousands of cases a year in the American South there are now only a handful, a salubrious and apparently important change that took place in just a few years. But all that has happened in actuality is that cases are now recorded only when proved to be malaria, where formerly the word was used in much of the South as a colloquialism for a cold or chill.

How to Lie with Statistics, p. 84-85, Darrell Huff.

Reading this book, two general questions come to mind.

First, the book, obviously, is not teaching you how to lie with statistics. It is showing you how to be an “informed skeptic” of statistics. And from this, I start to think about what a general framework of skepticism looks like, that would apply to any new information, rather than just statistics. My underlying thought here is that a sort of blind, consistent skepticism, always equally doubting new information regardless of its attributes, content or context, is unrealistic and unhelpful. On the other end, we can be skeptical of new information based on an informed understanding of how that content could be feasibly manipulated.

From the book, a proper skeptical approach to deciding whether a statistic is valid or not revolves around a) how data was gathered, b) what statistic was chosen to represent that data, c) how that statistic was calculated, d) in what context the statistic was reported, e) how the statistic is visualized, and f) how the statistic is used (e.g., what argument is the statistic being used to support?

Huff himself, in the last chapter, summarizes these questions more generally under 1) Who says so? 2) How do they know? 3) Did somebody change the subject? 4) Does it make sense?

I’ll walk through an adapted example that Huff uses to open Chapter 2: when you move to a new neighborhood, a neighbor eagerly reports that the average income is $190k - you’re in a rich neighborhood. A month later, at a community meeting that is discussing some sort of local tax increase (you arrived late), the same neighbor angrily points out that the average income is $80k - why are we raising taxes again? We’re going to all have to pay more, they exclaim! They brought out a fancy chart they made at home as well:

(The illustrations in the book are much better)

Applying the framework, we can ask the following questions:

A) What families are being included in the income estimate? When did the neighbor get this number - have some families moved away or into the neighborhood recently that change this number? Did the neighbor talk to every household, or just survey themself and their neighbor? See Chapter 1, “The Sample with the Built-In Bias”, or Chapter 3, “The Little Figures that are not there”.

B) Clearly, the first average is a mean, and the second average is likely a median. But we should still clarify which average is being used. See Chapter 2, “The Well-Chosen Average”.

C) But even in calculating the average - did the neighbor remove any outliers? Can we trust they did the math correctly and understood what statistic to use when? Why use a percent feather than just the value of the increase itself? See Chapter 9, “How to Statisticulate”.

D) Both of the numbers in the example are being used in a persuasive context. The first to persuade you to move to the neighborhood, the second to dissuade a tax increase. There is an implicit argument here at the community meeting: passing this tax increase means you will pay more, but you might want to question if this is the case, or how much you would pay, or what the community benefits might be. Perhaps the neighbor points out that the neighborhood next to you increased taxes, and crime went up. We need to be wary of misleading correlations used as evidence of causation. See Chapter 8, “Post Hoc Rides Again”.

E) Looking at the visualization - no axis labels? How was the percent increase calculated? But yikes, that taxes line is going up quickly and sharply - unless the taxes are measures in pennies? See Chapter 5, “The Gee-Whiz Graph”, or Chapter 4, “Much Ado about Practically Nothing”, or Chapter 6 “The One-Dimensional Picture”.

F) At the community meeting, the argument that the statistic is being used to support is that the neighborhood’s average income is too low for another tax increase. Yer perhaps this isn’t the most relevant statistic to use in this situation - maybe it is a property tax that only affects properties larger than X size. See Chapter 7, “The Semi-Attached Figure”.

My second question is how anyone would update this book for today. Are the ways that people misrepresent statistics the same as when the book was written? For the most part, I think yes. Perhaps the methods are more sophisticated, and it takes more background knowledge to be able to pick a statistic apart than asking what type of average was used. But most of the statistics reported in the media are still generally framed in ways that most people will understand: averages, percent changes, or correlations.

I’d also argue that a better understand of these basic errors have also led to the sort of “barrage of statistics” arguments that are common today. No reasonably-informed person will give just one statistic anymore - they’ll give 4 or 5, each linked to a very long and technical academic paper, or no sources at all, just a vague “read it somewhere.” Huff points out the overall error here in the book - how does the statistic relate to the argument - but disproving the argument might require an in-depth discussion on each statistic.

This is the biggest limitation with the book: now you are skeptical, but unsure how to demonstrate anything yourself. A general sense that something may not be what it proclaims it is will not win an argument. It is unlikely that you will read the academic paper, that you will review the aggregated data, that you can replicate the study… where does the skeptic go from here?

The book is at its best when exposing misleading statistics that should have a reliable, straightforward, and meaningful interpretation. But with statistics increasingly being deployed through more complex modeling, what is the path forward?