Astral Codex NaN: Let Me Fix Those Heuristics

Against Scott Alexander's latest

Feb 09, 2022

Scott Alexander is a prolific blogger with a huge following, especially among tech and libertarian spaces. I’ve been pointed to many of his articles over the years, and I usually enjoy them and learn from them. Alexander is an engaging writer, and many of his posts involve a lot of background research and thought. He is, however, a libertarian intellectual (or, as he puts it, a member of the “grey tribe”), and that also leads to his making arguments that often elicit frustration and/or an eye roll for their obvious bias in favor of that group. I’ve thought about writing up responses to these biases before, and potentially going through a series of his more famous posts to point them out. Today, seeing his latest post trending on Hacker News, I finally decided to do it. This is the first edition of my critiques of Scott Alexander posts, which I’m dubbing Astral Codex NULL.

Scott Alexander’s latest Astral Codex Ten (ACT) post starts with a set of parables about people who could be replaced with rocks. More specifically, a series of professionals–from a security guard, to a doctor, to a team of scientists–all employ the same decision making rule to avoid making errors: ignore the possibility of rare events and always assume problems are harmless. He further specifies that these people all have to make decisions in a world where this assumption is right 99.9% of the time. These “expert professionals” are completely insensitive to what they observe, but always give the same advice (“don’t worry about it”), so you might as well just replace them with a rock with that advice written on it. (This feels like a sort of weird intellectual framing that seems like it could only have come from the libertarian / “rationalist” / skeptic community: if you can’t make good decisions, you are literally not better than a rock.)

Photo by **Peter Döpper** from **Pexels**

Weighing Outcomes

For anyone who’s had to analyze data as part of their job, making decisions (or rather, creating an algorithm to make decisions) on this type of problem comes up all the time. It’s called having unbalanced (or imbalanced) data, and there are several tricks for dealing with it. The most common and obvious one is to apply weights to your decisions. Say you are the security guard in Alexander’s first parable. He works in a building that “basically never gets robbed,” and occasionally he’ll hear a sound. That sound could be robbers or it could just be the wind–but given the place he works, it’s almost always the wind. As Alexander specifies in his scenario, this decision is almost always correct. If I were analyzing data from this decision-making scenario, however, I’d split out my results according to the two types of errors I could make. My accuracy for picking out cases where there’s nothing going on is 100%--I always get it right, because I always pick that option. But my accuracy for cases where there is something going on is 0%--I never get it right, because I never pick that option.

As a competent data analyst, I’d realize this is a very bad way to analyze data–and a very bad way to analyze an algorithm. Instead, I might want to take the average of my accuracy for the two types of errors (50% average accuracy; although, actually, there’s a better way of weighing things called the F-1 score that I’d be more likely to use in practice). Then, I’d come to a better conclusion: this isn’t a very good algorithm. This is what’s missing from the analysis of Alexander’s scenario: the 99.9% accuracy is using a bad metric to assess the skill of this “dumb as a rock” security guard. Under a more appropriate accuracy metric, we’d see that not taking into account anything you’re noticing is a pretty mediocre-to-bad strategy.

Instead, I might want to make a few more guesses that there are robbers, and to go check a bit more often. Instead of setting my threshold for investigating at “never, no matter what the sound,” I’d lower my bar slightly to “rarely, unless there’s a big sound.” There’s actually rich literature on solving this problem called Signal Detection Theory. It’s been around since the 1940s, and has been commonly applied to psychology for decades.

One key feature of Alexander’s parables is also that the uncommon signal that all his professionals miss has terrible consequences. If you miss this one rare event, your store gets robbed, or your patient dies, or your entire island kingdom gets destroyed by a volcanic eruption. Well, you can take that into account, too. You can change your weighting of decision outcomes. For the security guard, for example, you might say that every time you guess that a noise was robbers but it turns out to be just the wind, you lose one point. But every time you guess that it was the wind but it turns out to be robbers, you lose 1,000 points. Now you can compute a new accuracy score that takes into account the relative importance of these outcomes. Not having to get up from your crossword is good, but actually catching someone trying to rob the store is about 1,000 times better. Using this method of scoring, you actually might try to be much more sensitive to noises than you were before, because the relative weighting of outcomes encourages. That’s certainly the way I’d approach the problem if I were developing a “check noise” algorithm.

Another relevant trick I’d use as a data analyst would be proxy variables. I was once working with a team of researchers who had followed a relatively large group of children throughout their adolescence and young adulthoods (let’s say it was about 250 people). The researchers were interested in finding out if there was anything that could help predict whether someone would attempt suicide. Thankfully, suicide is relatively rare. Let’s say that in this sample there was only one person who attempted suicide, and she survived. You can’t really do meaningful statistics when there’s just one case to look at, so you couldn’t develop a decision-making algorithm just by using the weighting trick above. So instead the researchers looked for reasonable proxies.

For example, what if we try to predict not just suicide attempts, but whether the individual ever report suicidal thoughts in periodic psychiatric check-ups? Not every suicidal thought will indicate that there’s going to be an attempt, but they are a pretty good indicator that there could be a risk of suicide. It turns out, there were several cases where people had these thoughts. It still wasn’t common, but there were enough cases that you could do some meaningful statistics. There were some predictive indicators that someone would have suicidal thoughts, and so you could start to use those as your early warning detectors.

In the cases Alexander describes, you could find similar proxy measures. For example, don’t tell the security guard to detect robbers specifically. Instead, you want him to detect any person or animal passing by the building. The volcanologists could look not just for an eruption, but for any kind of activity–small flares, increases in temperature, etc. So now you’re not in a situation where the “rock rule” (nothing is ever a problem) is as useful. There are a lot more cases you need to detect, and you can notice more quickly that someone’s not paying attention to what they see. Again, by being aware of the setup of the problem, you can find a way to better assess decision-making beyond the choice of “being totally insensitive to environmental inputs but being right often” and “listening to long shot, outsider viewpoints, but being right less often.” You can optimize for being accurate on a proxy or weighted accuracy, and thereby make a rational decision about how much overall accuracy it’s worthwhile to sacrifice in favor of catching rare but important events.

Photo by **RODNAE Productions** from **Pexels**

Practical Issues in the Scenarios

In Alexander’s original post, he reduces people’s jobs to just making a decision: is there a rare but terrible problem, or is everything fine? Thinking just in terms of decision-making algorithms, then, seems to be a fair analogy for the situation. It’s basically a way of asking whether you can, in principle, be a good decision maker in this type of situation while still respecting the unbalanced nature of the outcomes. However, I’ll admit that there are practical difficulties with implementing these algorithms in a person’s behavior. The security guard may really be intellectually lazy and not care much whether the store he guards gets robbed. More to the point, it might be hard to implement the kind of decision weighting scheme that an algorithm uses. How do we give him a whole heap-ton of points for catching that one robber in a decade?

Practically, the problem might be better solved by just having periodic drills or inspections. You tell the guard that you will occasionally send fake robbers to the office, and it’s his job to go out there and find them when they come. If you do that every month, he gets in the habit of checking more often. For the doctor, you may have a continuing education module where an actor comes in as a patient, and the doctor is graded on whether they were able to identify the illness (or no illness!) that the person has. It is difficult to grade decision-making algorithms on rare events, much less implement those algorithms in human minds. But non-trivial, non-rock decision-making rules for these types of situations can, in principle, be found.

Ideology Over Substance

The way that Alexander’s post is set up, he is leading you towards the conclusion that we shouldn’t just automatically assume people who are making unusual predictions that go against consensus and typical experience are wrong. If we don’t take into account the “maverick intellectuals,” the IDW folks, the “sensemaking” community (all synonyms for the sort of right-libertarian commentary sphere that most readily embraces him), then we might make a huge mistake. We could miss out on the robbers, the cancer diagnosis, or the volcano eruption that only these people saw. Furthermore, our cynical, knee-jerk rejection of these people is an intellectual error. We don’t even listen. We’re quite literally as dumb as rocks.

Alexander is reasonably explicit in his metaphor. When he is talking about a knee-jerk skeptic being dumb as a rock, he writes:

“She’s always right! When the hydroxychloroquine people came along, she was the first person to condemn them, while everyone else was busy researching stuff. When the ivermectin people came along, she was the first person to condemn them too! A flawless record. (shame about the time she condemned fluvoxamine equally viciously, though)”

People who criticized hydroxychloroquine and ivermectin (or at least criticized them too quickly) were just being dumb old rocks. Yes, those two treatments were hyped up by right wing pundits, but neither turned out to be effective treatments for Covid. If, after seeing those two whiffs, you started to just reject any of the advice from the right wing outsider spaces, you would have missed out on the use of antidepressant fluvoxamine to treat Covid! (Right now the official NIH guidelines say there is insufficient evidence to recommend fluvoxamine for Covid, but that there are ongoing studies of whether it can help by reducing inflammation.)

Further down, he also lists some real world examples of where he sees the “dumb as a rock” rejection of ideas playing out: “This conspiracy theory is dumb. This outsider hasn’t disproven the experts. This new drug won’t work. This dark horse candidate won’t win the election.” Rejecting a conspiracy theory (deep state censorship, for example) is just being a dumb rock. Trusting an expert over an outsider (Fauci over Brett Weinstein, for example) is being a dumb rock. Doubting that a new drug would work (ivermectin for Covid) is being a dumb rock. Doubting that the dark horse candidate (Trump) won’t win is being a dumb rock. These aren’t just random examples. They connect very specifically with issues important to the IDW crowd, and they can easily be interpreted as flattering to their pet causes. Again, his examples skew in favor of making the IDW crowd look like misunderstood seers.

Photo by **Sora Shimazaki** from **Pexels**

I don’t usually think of Alexander’s writing as throwing red meat to his fan base, but this clearly seems to be a way of flattering his audience’s preconceptions. “People who doubt our community’s contrarian takes are mindless and wrong.” Yet as anyone with statistical literacy can tell, his examples rely on misleading ways of assessing decision rules, and don’t accurately represent the tradeoffs inherent in scenarios. It is hard to make decisions about rare outcomes. And it is human nature to get complacent without feedback–or when negative feedback is obscured or not brought to your attention. Yet there are also perfectly rational ways to make decisions in these scenarios, and illustrating that yields very different inferences.

Rewriting the Scenario

Given what we know now, we can rewrite Alexander’s scenarios to reflect a better understanding of decision making.

Photo by **ANTONI SHKRABA** from **Pexels**

The Doctor

There are a few primary care doctors in this town. They take different approaches. One of them inspects her patients, but then, no matter what they say, always replies “It’s nothing, take two aspirin and call me in a week if it doesn’t improve.” It always improves, and no one ever calls her. However, over the course of her career, two or three of her patients die of serious illnesses that she missed.

Another primary care doctor is a bit more cautious. She inspects her patients, and often asks them to get follow-up tests. These can be expensive, and are often uncomfortable and inconvenient. Most of the time, they turn up nothing. However, she has caught two or three deadly diseases early enough to save someone’s life.

The third primary care doctor is an intellectual maverick, someone who thinks outside the box. She inspects her patients, and then, no matter what, recommends that they get a new, expensive test and follow-up treatment. Every one of her patients has to pay a lot, and is forever having to complete blood tests and biopsies and get their hormone levels checked and having brain scans done. They all realize that she has a financial incentive to order these tests, and that means she is probably ordering these tests more than is necessary. Further, she has cultivated an image as “the cutting edge doctor,” so her personal image–and her brand–are dependent on her continually ordering new tests and procedures. She has saved one or two patients from deadly illnesses over the course of her career, but she has misdiagnosed one patient who could have been saved because she did not rely on conventional tests and thinking. She has also cost all her patients a lot of time, energy, and money.

The fourth primary care doctor is a rock. The rock is painted with the words “IT’S NOTHING, TAKE TWO ASPIRIN AND WAIT FOR IT TO GO AWAY.” Many people without health insurance swear by the rock. “That advice is always right!” they say. Two or three of the people who use the rock have died of preventable illnesses, but for the most part people who use it have been fine.

A woman in town, Jennifer, is looking for a new primary care doctor, and is considering these four options. All of them have been around for around 20 years. The first doctor’s reputation is that she’s nice, but she’s not very careful. People noticed that some of her patients died. The second doctor’s reputation is that she might occasionally order a test you don’t need, but she will do a good job catching serious illnesses. The third doctor’s reputation is as a very expensive weirdo. People don’t know whether to trust her, but some people want to be seen as ahead of the curve or smarter than most other people, so they’re always willing to brag loudly about going to see her. People chuckle about the rock, but they appreciate never having to waste time in a waiting room or having a co-pay.

Jennifer ranks her options. By reputation, the second one is the best. But Jennifer is also a very rational person with a firm grounding in statistical reasoning. She also sees that all the doctors have provided information on how their patients have fared. So she collects all the data and implements a decision-making algorithm. She realizes that dying is a lot worse than having to pay for extra medical costs. On the other hand, if she pays too much in medical costs, she literally won’t have enough to pay for her rent and groceries. So she is able to come up with a relative weighting. By this rank, the best option is the second doctor–who has the right balance of caution and accuracy. Second best is the rock, who has zero costs and high accuracy. Third best is the “do nothing” doctor, who has a copay. Fourth is the intellectual maverick who makes you pay a ton for very little benefit. This doctor is significantly worse than a rock.

Don’t Be Worse Than a Rock

Alexander’s original essay aimed to give abstracted parables to show how outsider intellectuals shouldn’t be automatically dismissed. He compares rejection of their ideas to being literally as dumb as a rock. My rewriting of one of his parables highlights how individuals who need to make decisions actually have a number of options to choose among. These options include the institutional insiders, the intellectual outsiders, as well as a knee-jerk decision strategy. In my abstracted parable, I show how the outsider intellectual can often be worse than a rock. The very need to show how different they are can lead that person to over-emphasize having a counter-intuitive, “hipster” take on current events. Most of those counter-intuitive takes will be worse than just going along with the crowd, precisely because they go against intuition–they reject what broad strokes common sense recommends. In fact, the best strategy is following the advice of someone who is only occasionally making counter-intuitive proclamations, but who is mostly willing to rely on the benefit of using their broad strokes, common sense knowledge to get things right. Only when they’ve got very good evidence do they actually go against the rock (obvious public opinion). In fact, that’s the only way to beat the rock.

I think it’s useful that Alexander structured his parables around individual decision makers. We tend to trust people more than ideas, such that each political tribe and sub-community tends to have its own particular set of experts. These experts can come to set the agenda for that group, and the group will tend to weigh their opinion more favorably. Scott Alexander is this kind of figure. So is Joe Rogan. So is Jordan Peterson. So are Brett and Eric Weinstein, Bari Weiss, and all the other IDW figures. Most of us are never going to be the agenda setters like these people. Instead, we play the role of Jennifer in my story: we are deciding among epistemic communities. And the IDW / “sensemaking” type intellectual communities is the one that often values being counterintuitive just to be trendy. It’s also got a vested interest in attracting attention this way–most of its members are outsiders to traditional institutions who make their living off of clicks through Youtube, Patreon, Substack, etc. The common sense position will often be expressed first and best by the traditional media pundits. They need to be counter-intuitive to draw eyeballs, and no one is out there with a scorecard keeping track of whether that click-bait actually makes them worse decision-guides than a rock.

The key part about this structure is that we don’t judge each new proclamation independently. We judge based on the track record of the individual or the group. If they’ve repeatedly made counter-intuitive statements without doing their due diligence–and we find that out by seeing that there is no evidence for ivermectin, for example–then we learn their reputation. And so knee-jerk skepticism of this community isn’t “rock thinking”--it’s rational, learned weighting of their input. It’s knowing that the advice of this community may very well be dumber than a rock.

Deep Dish Psychology

Discussion about this post