Chapter 6 The Modal American

The NPR podcast Planet Money aired an episode in the summer of 2019 called The Modal American. If you haven’t listened to it, you should stop what you’re doing and listen to it right now. (Seriously, this will make a lot more sense).

The podcast hosts, Kenny Malone and Jacob Goldstein, set the stage as follows. People sometimes talk about the “average” American, but that doesn’t really make sense, even as an idea. If you average the traits of all Americans, you don’t end up with a real human being.6 Malone suggests that what people are really thinking about is the modal American, i.e., the most common type of American. The kind person you’re more likely to bump into on the street than any other. The hosts then consult with economics and data reporter Ben Casselman to help identify who is the modal American.

For the purposes of their investigation, the kinds of American the Planet Money hosts are interested in are determined by groupings of demographic variables like those collected by the US Census. The census does not, to my dismay, include questions about peanut butter preference, toilet paper orientation, digitidiness, or poopiness. Rather, they stick with things like sex, race, age, income, marital-status, etc. It turns out that different people mean different things when the ask, “how many kinds of Americans are there?” In any case, this journey to put Americans into buckets leads down some paths that will be very familiar to you by now.

Step one: categories

First of all, some of the variables, like sex, race, and marital status, are categorical to begin with. We can ask people to answer these survey questions, and then we can combine their answers into kinds like married-Hispanic-male or single-White-female types of people.

Some of the variables included, like neighborhood-type, are categorical in the way we think about them (read: rural, suburban, or urban), but are actually derived from numerical variables. In this case, neighborhood type is derived from the population density in the census block. Other variables, like income and, arguably, age, are just plain numerical. But to identify the most common kind of American, our podcast hosts need to put people into categories (like rich and poor–to make things dichotomous–or low-, middle-, and high-income). So they discretize all of the numerical variables into a small number of categories. They “bin” the incomes as well as the ages, which they combine into generation labels like “Generation X” and “Baby Boomers.”

Step two: contingency table

Having done all this, the next step is to combine the data about all Americans along each of these variables into one big contingency table. It’s not a two-way table, because there are more than two variables, but it’s still a contingency table. (It is not very common to use terms like four-way or five-way table.)

The choice of variables, and specifically the choice of the number of categories to keep for each one, determines how many kinds of Americans there can be. Having made the choices, it is a simple matter of looking to see which is the most common kind.

Along the way, the idea of dependence, or association, also comes up, but it is not necessarily mentioned by name. Supppose you know the most common age category is Children. And suppose the most common category of marital-status is Married. It does not necessarily follow that married-children is the most common category in a two-way table made from those two variables. Of course, the opposite is true. And this is because age and marital status are associated variables. The older you are, the more likely you are to be (now or previously) married, as opposed to never married. In general, you cannot assume that the most common category in two variables will combine to produce the most common two-variable-category. But this would be true if the variables were independent.

For independent variables, it must be the case that the most common joint category will emerge from the simple combination of the most common individual categories. You could actually use that as a definition of independence.

Take sex and race, for example. In the Planet Money podcast, both of these are actually treated as dichotomous variables (all race categories other than white are combined into one, so it is basically white or not; perhaps you are hearing “there are two kinds of people in this world…”). Since female is the most common sex and white is the most common race category, it has to be the case that white-female comes out as the most common combination. Sex and race are independent.

Perhaps it is reassuring to you that the ideas we have been developing are not just limited to semi-jokey two-kinds of people questions, or Buzzfeed personality tests. But that they apply exactly as before to serious data.

Exercises

  1. At around minute 5:15, the hosts describe the what they see looking at the (histogram of) age distribution. What kind of distribution are the hosts describing?
  2. Why such (over)simplification? Hint: there is a discussion around 9 minutes in.
  3. The methodology of the Planet Money analysis is described on their website. From this, determine how many variables there are, how many categories there are for each one, and thus how many kinds of Americans (i.e., how many buckets) there are in total.

  1. We encountered this fact in the form of a joke in Chapter 1, specifically here↩︎