How trust undermines science

“According to science” is a cursed phrase. It signals a high probability that the content you’re about to consume is based on small, sketchy studies that probably don’t replicate. But it wasn’t always so. People used to trust science, myself included. Then the replication crisis happened.

In 2012, Daniel Kahneman sent out an open letter to researchers in the field of psychological priming, calling them to get their act together from what he saw was a looming “train wreck.”

Priming was a hot topic in psychology, claiming that subtle cues could influence people’s behaviour in a big way. One example was a curious 1996 finding of priming researcher John Bargh.

Bargh claimed to find that when people completed a word scramble including words like “worried,” “sentimental,” “rigid,” and “Florida”, it activated an elderly stereotype. People who were primed with the elderly words subsequently walked much slower than people exposed to neutral words. In fact, according to the study, the students exposed to “elderly” words walked about as slowly as actual septuagenarians, whereas students in the neutral condition walked normally for their age.

Years later, other researchers tried to corroborate the finding. Stephane Doyen et al. (2012), for example, conducted a study large enough to detect much smaller effects than in the original claim. But it found no effect at all. First, using objective measurements of walking time (using lasers), they found no difference in walking speed between elderly-primed and unprimed subjects. Second, and more interestingly, they did a second experiment in which the experimenters became the subjects. These subject-experimenters were given stopwatches and persuaded to believe that primed subjects would either walk faster or slower. Using their stopwatches, they indeed found a difference in the expected direction—faster or slower, depending on experimenter expectation (and contrary to the original Bargh study in which only elderly primes make people walk more slowly). But even using the lasers, there was still a difference.

The experimenters were seemingly able to pull the subjects themselves into their as-if game. The authors reported that subjects who walked more slowly were apparently aware of it. They concluded that the experimenters who expected slower walking speeds somehow communicated this expectation to the subjects through their behavior.

For those not used to some of the more outlandish claims from experimental psychology, it might be no surprise that such a remarkable claim would fail to replicate. The more interesting question is why anyone ever believed it. Many defended the claim at the time, and many continue to defend the essential reality of priming studies (as well as a modern incarnation of priming, the “nudge”). Even Kahneman, in his “train wreck” letter, identified himself as a general believer in subtle priming effects.

The most important defense of the Bargh et al. walking speed study came from Ramscar et al. The authors explain many possible reasons why the Bargh slow-walking effect wouldn’t replicate: the study that aimed to replicate it decades later would necessarily confront a different population age structure, a difference in word frequencies, and a difference in the associations of words such as “Florida.”

I think they are exactly correct that priming effects are not the kind of invariant phenomenon that might be encountered in physics or chemistry and that might be expected to meaningfully replicate. In other words, priming is attempting to be a study of context. Context matters so much that there isn’t much left when a priming manipulation is taken into a new context, a new population, or a new time. The context surrounding scientific studies constantly changes. But the crux of the Ramscar et al. argument is that priming studies should be trusted, considered as true at least in their time and context, even if they do not replicate. Moreover, studies that are unlikely to replicate should still take place and are still meaningful, they argue. Since psychological phenomena are not invariant across time and context, the fact that they do not replicate should be no epistemic mark against them. To suggest so would be physics envy.

However, the admittedly transient, ever-varying phenomena of the priming laboratory are in practice given as evidence for physics-style universal laws. Bargh uses them to argue that social behavior is largely automatic, for instance. If priming effects are real but transient and can never be replicated, how can they be distinguished from, say, ESP or precognition studies? (Incidentally, a precognition study published in a peer-reviewed social psychology journal was another precipitating factor of the replication crisis addressed in the “train wreck” letter.) Replication is one tool the audience of a study can use to trust, but verify; without replication, we are asked to simply trust. Revelations of scientific fraud, especially inept and obvious fraud that went long undiscovered in major papers, make unverifiable trust seem like a lot to ask. And failed replications simply make the original claims look goofy—as they perhaps should have in the first place. While superficially attempting to be a study of context, priming now seems to be little more than a collection of cheap magic tricks.

To me, the phenomenon of widespread trust in goofy claims is much more interesting than the fact of goofy claims not being real. What does it mean to trust science? The philosopher C. Thi Nguyen presents an account of one form of trust, where he describes trust as an absence of deliberative questioning:

“We can take the unquestioning attitude towards a wide variety of objects and artifacts. To trust, in this sense, is to have stepped away from the deliberative process. It is a way of settling one’s mind about something. To trust is to lower the barrier of monitoring, challenging, checking, and questioning—to let something inside, and permit it to play an immediate role in one’s cognition and activity. It is, in a sense, to give an external resource a direct line into one’s reasoning and agency.”

Some forms of information, he argues, are simply taken as true, without deliberation or consideration of their veracity. Within experimental psychology, researchers often seem to be living in an “as-if” game of imagination. Or, to put it another way, they are constantly using the improvisational comedy trick of “yes, and.” They are believers in any result that has come before. Rather than calling previous results into question, they simply build on them.

The Doyen et al. study that failed to replicate Bargh’s finding of elderly priming seemed to be a surprise to Bargh et al. It was a hostile and perhaps illegal move in the as-if game, as unexpected as someone saying “nope” in improv. Bargh responded saying, “people say we should just throw out all the work before 2010, the work of people my age and older, and I don’t see how that’s justified”—as if the main concern was the legacy of the game-players rather than the reliability of the findings.

Outsiders aren’t players in the as-if game in experimental psychology, but we are often exposed to science—in the form of news articles summarizing research with quotations from the researchers, or screenshots of abstracts of papers on Twitter, or paragraphs listing claims with citations to other papers, or figures (especially charts), or—worst of all—popular science books. Even among outsiders, there often seems to be a magical credulity field surrounding scientific claims in certain formats.

The unquestioning attitude toward science, if it was universal, would have prevented the Doyen et al. study from ever happening. But in the nine years since the study and the “train wreck” letter, despite all the Many Labs and failed replications in between, the unquestioning attitude from many toward such findings does not seem to have waned. In his recent popular science book Noise, cowritten with nudge enthusiast Cass Sunstein, Kahneman even cites the notorious “hungry judges study,” which purported to find that judges gave out much harsher sentences just before their lunch break (it was almost immediately called into question upon its publication). The unquestioning attitude is alive and well and can even rehabilitate past findings known to be false.

Projects of questioning science can sometimes be successful; failed replications at least have the potential to undo some of the harm of the original false claims. But mostly, attempts to disrupt the unquestioning attitude at every level are ignored.

No one knows how common research fraud is, but Richard Smith, former editor of the BMJ, recently put the matter quite strongly in the context of medical research, saying that:

“We have now reached a point where those doing systematic reviews must start by assuming that a study is fraudulent until they can have some evidence to the contrary.”

The most basic level of trust in science is trust that the reported experiment actually happened. Joe Hilgard described his efforts to report scientific misconduct at the level of fabricated trials, and found that the journals mostly weren’t interested. Moreover, the fraudulent researchers would simply change their tactics once he pointed out their fraud. For example, after Hilgard pointed out that a trial with over 3,000 subjects was unlikely to have occurred, the researcher in question now sets his sample sizes at a believable couple hundred.

Another level of trust is the idea that a citation to a source accurately reports the information there. In my experience, it is the norm, rather than the exception, for cited claims in popular science books and review papers to misstate the claims of their sources. The popular science book Why We Sleep by Matthew Walker, for example, was eviscerated for its misleading citations by Alexey Guzey in a review, but this did not result in any institutional action toward Walker or public acknowledgement of the flaws in the book. Walker was defended as promoting an important message, even if he got a few things wrong in shady ways. (One reviewer, aware of the Guzey criticisms, said that Walker can be forgiven for his errors, because his exuberance comes from the right place.)

The term “pious fraud” is usually used to refer to religious people who knowingly promote hoaxes while believing in the underlying religious message of the hoax; it was a term commonly used in the skeptical movement of the 1990s to refer to people like stigmatics, faith healers, and the creator of the Shroud of Turin. Similar to pious frauds, researchers who believe in the truth of the message of “sleep is good for you” or “social behavior is automatic” or the like may produce or promote silly findings they know to be false or meaningless, because these findings support an important message.

There are many more layers of trust involved for a scientific result: trust that the data produced were not falsified or altered, trust that the data are meaningful in the first place (e.g., survey data), trust that the methods used actually support the general claims made (see e.g., “The Generalizability Crisis” by Tal Yarkoni), trust that the effect is large enough to be meaningful or even detectable in context, and trust that the mathematical analysis performed is appropriate to discovering truth from the data. This is not an exhaustive list.

There are a couple of reasons why science might deserve to be epistemically privileged by an attitude of non-questioning. First, if scientists themselves were questioning one another’s ideas behind the scenes, this might reduce the need for outsider distrust. However, in the field of priming and likely in many other fields, researchers seem to have been engaged in an improvisational game of accepting results as true without question. A field in which it is rude or unusual to challenge the findings of others, especially facially goofy findings, has little claim to have its results trusted by outsiders.

Second, if there are obvious real-world effects of a scientific claim, like airplane travel and high-speed internet, an unquestioning attitude is probably appropriate. This does not seem to be the case for the inexact sciences. There is no technology that implements the results of these priming studies as a load-bearing component. We can’t look out into the world and see undeniable proof of the truth of priming effects. Priming exists mostly as words and figures in papers, with at best metaphorical application to the everyday world.

It seems doubtful that fields like psychology will effectively reform themselves. The institutions that benefit from the unquestioning attitude are unlikely to take measures to dig out the rot, because they realize that they are mostly made of rot. Even when insiders are motivated to change, it is not clear that traditionally scientific methods in the social and inexact sciences are even possible. So what can be done? Rather than attempting to reform scientific fields from outside (especially the science of social and psychological abstract nouns), what is within the control of the individual is to change the immediate response of trust.

In my experience as a bad-science enthusiast, the best way to identify sketchy scientific claims is by their level of abstraction. If a new paper claims that giant sloths were more common in present-day Arizona than in present-day Nevada before the last ice age, I would not identify that as a sketchy claim; it is limited in time and place, and not particularly abstract. However, if a new paper claims that attractive people are more (or less) generous than ugly people, I would be highly suspicious. “Generosity” is a very abstract and nebulous quality, and I would want to know how it was measured—and I know I would not be satisfied with the answer (it will be a survey or an economic simulation game). The profound abstractions of everyday life—happiness, relationships, sleep, nutrition, motivation, pain, belonging—are the concepts most vulnerable to scientific abuse.

Another way to skillfully exit the unquestioning mode is to imagine what else would be true if the claim were true. For example, if the Bargh walking speed priming effect were real, our walking speeds would be constantly changing in response to subtle cues (as opposed to being mostly determined by physiology and walking surface). Seeing a billboard for denture cream could make you late for work. You could use it in military applications to slow down the enemy. This is a variation on the as-if game: Jump out of the unquestioning attitude by trusting the claim way too far and thinking through the implications. Once you start asking how the abstractions are cashed out and what the implications of the claim would be, you may begin to notice questions like: How big is the effect? Would I be able to notice it in real life? Is there other research that says the opposite? Is there other research that says that the whole abstraction is a huge mess? That is the beginning of the end of the unquestioning attitude. Questions like these are part of science, and asking them is more respectful to true science than unquestioning trust.

Natalism for progressives

Better eats

Buyers of first resort