The great reinforcer - Works in Progress Magazine

Everything you need to know about how we do science can be learned from COVID-19.

That may be an overstatement – but only a slight one. Our science-heavy year, with its procession of confusing results, depressing predictions, worrisome discoveries, and frustrating debates, has shown us the very best—but also the very worst—of science.

To be sure, out of the gloom of the pandemic came some incredible advances – the stunning progress made on vaccines chief among them. But these bright spots were something of an exception. For those of us with an interest in where science can go wrong, the pandemic has been the Great Reinforcer: it has underlined, in the brightest possible ink, all the problems we knew we had with the way we practice science.

These problems are not due to some new, coronavirus-induced malaise that’s suddenly affected our research. They didn’t emerge fully-formed in the year 2020. These are failings in our academic and scientific system that have been recognised for years – even decades. They’re problems that urgently need to be solved for our scientific endeavours to reach their full potential. And if, for some reason, you didn’t think they were important before COVID-19, you certainly can’t ignore them now.

When the pandemic struck, scientists first looked to the past. Did our existing scientific literature hold the clues we needed to hold back the waves of infection? In some cases, it absolutely did. Immunologists, with truly astounding speed, were able to turn the prior literature on messenger RNA into novel vaccines that have spearheaded our fightback against the virus. The same went for more “traditional” types of vaccine, which were developed just as rapidly but which, at the time of writing, still face regulatory tests.

Other attempts to glean insights from the literature were less edifying. In my own subject, psychology, the early weeks and months of the pandemic brought us a now deeply-regrettable series of high-profile Op-Ed pieces downplaying the threat of the virus, pointing to studies on how people irrationally exaggerate risks – just as the COVID death counter ticked into the thousands.

In the literature itself, an international group of researchers took one week to write a review of how psychology could be put to use: for instance, helping people understand threats, aiding in emotional coping, and fighting fake news. They promised they’d “describe the quality of the evidence to facilitate careful, critical engagement by readers”. Unfortunately, searching the paper reveals little-to-none of this quality assessment; following many of their references reveals a plethora of small-scale, unreplicated studies with borderline results, along with theoretical papers of at best dubious relevance to the pandemic.

The reviewers had inadvertently illustrated two of the major problems affecting many scientific fields: much of our research is of low quality (and thus unlikely to be replicable), and much of our research is done in a very specific context (and thus unlikely to be generalizable). Whether it’s research on “growth mindset”, “grit”, or “implicit bias”, some widely-publicised findings from psychology have been torn out of the lab setting and thrown into the real world far too quickly. When it came to the pandemic, it was all the more crucial that researchers were ultra-cautious about which of the papers they cited were really ready for prime-time. But such papers seemed surprisingly hard to find.

When the test came, much of our pre-existing science flunked it. But perhaps asking for pre-existing answers was asking too much – after all, we had a new virus and a completely unprecedented crisis on our hands. What about the research produced during the pandemic itself?

The unprecedented crisis has been accompanied by an unprecedented volume of new research. At first, this took the form of a flood of preprints. These unvetted research reports, posted on servers such as arXiv, bioRxiv, and medRxiv, allow scientists to share results without having to wait months (or even years) for stuffy, old-school peer-reviewed journals to check them (most, or at least many, do eventually end up published in a journal). As the pandemic kicked in, preprints dramatically sped up the accumulation of knowledge on the virus compared to previous outbreaks.

Of course, scientists still value that old-school journal process, and some are anxious about preprints, since they allow non-peer-reviewed claims out into the world while looking, to the untrained eye, little different from a published paper. In the case of the coronavirus, these fears began in late January, centred around a preprint on bioRxiv which heavily implied—without stating it explicitly—that the novel coronavirus contained genetic material from the HIV virus. This “uncanny” similarity was “highly unlikely to be fortuitous”, and suggested that the virus might have been man-made. A result like this could have sparked off a lot of conspiracy theorising – or even an international incident. After serious flaws were noted in the analysis, bioRxiv quickly withdrew the paper.

The preprint servers responded to this and other incidents by strengthening their vetting process. Contrary to what many believe, humans do check each preprint before it goes public: they don’t provide a full peer-review, but they do screen for plagiarism, the obvious work of cranks, and other red flags that might indicate low-quality research. They also added a clearer note at the top of each webpage to note that these reports are not yet peer-reviewed.

Debate around preprints long predated the pandemic. For now, the advantages of speedy science far outweigh the downsides of bad papers leaking out into the world with an unearned sheen of authority. Perhaps one positive aspect of the crisis, with many more people focused on scientific studies and thinking about how to report them, has been that journalists and other readers are now at least somewhat more aware of the different stages of a study, and that preprints—while still serious contributions, shared for scientific discussion—aren’t yet imbued with the full credibility of a peer-reviewed paper.

But the credibility of peer-review itself has taken a few body blows this year. Indeed, concerns about false claims in preprints should evaporate just as soon as one sees what has passed through the peer-review filter and into scientific journals in the past year.

The sheer number of angles taken by scientific papers on the coronavirus boggles the mind. Like the clever-clogs student in the front row of the lecture theatre, scientists strained to make their own contribution, no matter how tenuous. Did we really need, for example, a study on the relation between national COVID-19 fatality rates and a country’s average 2D:4D ratio – the difference between the lengths of a person’s index and ring fingers, supposedly a measure of prenatal testosterone? Regardless, we got one, as early as April. As was pointed out in a critical letter, plugging in different case fatality ratios from later in the pandemic makes the correlation from the original study vanish, rendering extremely questionable the authors’ theory about low testosterone being a risk factor for the disease.

Psychologists, for their part, began to churn out new questionnaires and scales for measuring people’s reactions to COVID – most of which were minor variations on the same theme. These included the COVID-19 Perceived Risk Scale, the Fear of COVID-19 Scale, the COVID-19 Anxiety Syndrome Scale, the Coronavirus Anxiety Scale, the COVID-19 Anxiety Questionnaire, the COVID-19 Burnout Scale, the COVID-19 Pandemic Mental Health Questionnaire, the COVID-19 Student Stress Questionnaire, and—perhaps ironically—the Obsession with COVID-19 Scale.

As scientists rushed to publish on the pandemic, journals did their best to help them. Some editors began to offer fast-track reviews for COVID-related papers. One analysis in June found that the median time from receipt to acceptance of a COVID-related journal article (across all scientific fields) was just six days, compared to a median in 2019 of ninety-three days. How this left time for rigorous peer-review is anybody’s guess – and one suspects it didn’t. A systematic review of hundreds of COVID papers found that their methodological quality—in terms of bias-reducing factors like blinding and randomisation—was substantially lower than papers on similar medical or biological topics from the previous year.

As so often in academia, a well-meaning idea—the fast-tracking of reviews to improve the pace of scientific progress in a pandemic—worked poorly in practice. That’s because it collided with the toxic academic publish-or-perish culture, where longer CVs mean greater prizes – in the form of tenure, promotion, and other career rewards. Academics had a lot to gain by taking advantage of the rapid reviews, even if their actual scientific contribution was questionable. Combine that with a genuine desire to help out—as well as the fact that many scientists whose labs were closed by lockdowns were otherwise twiddling their thumbs, so tried their hand ineptly at some COVID research—and you have a recipe for an epidemic of tossed-off, low-quality papers.

By June, a group of medical researchers published an editorial pleading with other scientists to apply the brakes. This was no time, they wrote, for:

"…rushed science, attempting to publish ‘anything’ on COVID-19, providing loose suggestions on treatment, battling to be the first to report new data or competing over citation indexes."

They suggested that only those who could contribute “high-quality and knowledgeable work” should publish on the pandemic. It was a noble sentiment—scientists should certainly ask themselves if their latest pandemic publication is strictly necessary—but the problem is that most scientists do regard their own work as high-quality and knowledgeable, regardless of the reality of the matter. And there’s no way of enforcing this idea without restricting academic freedom.

Critics who had long lamented the emphasis in science on mere publication over rigorous research could easily have predicted the onslaught of useless, irrelevant papers that would follow in the wake of the pandemic. But as we’ll see, a great deal of the research that appeared to be directly relevant—even crucial—to our understanding of COVID-19 was similarly flimsy.

Facemasks have had something of a rollercoaster ride in 2020. The dramatic U-turn on the part of authorities in the early Summer towards a pro-mask position, having advocated vociferously for just the opposite in previous months, will forever stand as an indictment of expert overconfidence and groupthink. Meanwhile, the actual research on masks demonstrated even more of the fragilities in our scientific enterprise.

Take the paper published in mid-June in the journal Proceedings of the National Academy of Sciences. Its overall conclusion was strongly in favour of mask-wearing: in fact, the authors wrote that whether a population wore masks was “the determinant” of COVID spread in various countries. They presented the most simple-minded analysis of infection rates possible, modelling the rate as a straight line that, had no policy changes occurred, would steadily increase forever. This flew in the face of proper epidemiological models, where infections wax and wane in far more complex ways. The authors also blithely mixed up correlation and causation by assuming that, since masks were recommended in New York and Italy on particular days, any subsequent declines in the infection rate in those places must have been due to masks and masks alone. There were soon calls for the study to be retracted (it hasn’t been, but there’s now a correction and some published criticism).

How did such substandard work get published in such an influential journal? One reason might be that it had been submitted through a special mechanism—PNAS calls it a “Contributed Submission”—where members of the National Academy of Sciences, a small cadre of elite US researchers, get to bypass the normal process by choosing their own peer-reviewers. Here’s a clear example of the academic journal system failing science: waving through low-quality research simply because it was produced by members of a special club.

A flawed correlational study wasn’t going to answer the crucial questions about the effectiveness of facemasks. What we needed was a randomised controlled trial – the gold standard of evidence. Unfortunately, unlike the gigantic surge of interest in running trials of drugs to treat COVID-19 (of which more below), so-called “non-pharmaceutical interventions” were largely neglected by researchers – even though these kinds of interventions would become part of far more people’s lives.

The one randomised trial of facemasks for COVID-19 published so far, run by researchers at Copenhagen University in Denmark, turned into a debacle even before it appeared. After it was registered, prior to data collection, other scientists noted their doubts that it could provide a clear answer to the question of the effectiveness of masks. For instance, they worried about compliance: to what extent would the study participants really stick to either wearing or avoiding masks? Then, while the data were collected, everything went quiet. Too quiet, some argued. By October, mask sceptics were outraged: the study had apparently been rejected from multiple journals. Had it been suppressed by some sinister medical conspiracy, owing to results that questioned the use of masks? Probably not – any working scientist will tell you about the common slog through several journals to find one that will publish a paper (which, incidentally, is a good example of where preprints come in handy).

When the study finally appeared—and the authors declared they had found no effect of masks on COVID risk—it confirmed another of the pre-existing doubts about the research. In a nutshell, the study was far too small to be useful. The number of participants gave the researchers enough statistical power (the chance of finding an effect if one really exists) to detect a 50% reduction in COVID risk, but nothing less than that. This was not, in other words, a particularly sensitive study. Underpowered analyses—which are sadly rife in biomedical research—are like looking for a distant exoplanet using a cheap pair of binoculars: even if it’s there, you simply won’t see it with the tools you’re using.

If facemasks made a smaller-than-50% reduction in infection rates, it wouldn’t be picked up in the study. And since few would have predicted that facemasks would slash your COVID risk in half—a smaller effect would be more plausible, and would still be helpful in the fight against the disease—it raises questions about the point of the entire exercise. And that’s before you get to the fact that the study tested only whether the wearer was protected – not whether the mask they were wearing helped protect others from germs they might spread.

Unlike what should happen when a new, convincing piece of evidence rolls in, nobody updated their beliefs based on the Danish facemasks study. Pro-maskers rightly argued that the study was uninformative. Mask-sceptics wrongly—but superficially convincingly—argued that, since it showed no effect, the study was the final nail in the coffin of facemasks for COVID. These misinterpretations will likely have convinced a lot of people to stop wearing masks – and if they do have a small-but-appreciable effect on disease spread (and other evidence does point in this direction), this will have endangered lives. Since we didn’t learn much else from the study, those lives will have been endangered for little benefit. Sometimes, having a study that’s inconclusive is worse than having no study at all.

In case the studies on masks weren’t bad enough, studies of drugs for COVID-19 plumbed even further depths of embarrassment. The controversy began in late March when, uncharacteristically, Donald Trump referenced a scientific journal article. In a tweet, he wrote that the antimalarial drug hydroxychloroquine, used together with the antibiotic azithromycin, had “a real chance to be one of the biggest game changers in the history of medicine”. He pointed to the International Journal of Antimicrobial Agents, where French academics had just published a paper that apparently showed that patients treated with the drugs had been largely purged of the virus. There was, the trial’s senior author reported, a “100 per cent cure rate”.

It wasn’t long before the study was debunked. The researchers had committed a litany of scientific sins in designing and analysing the study – many of them noted in a critique that appeared in the same journal in July. The study, which included a paltry thirty-six participants, didn’t actually measure symptoms. Some of its “control” participants were from a different hospital with different protocols. The way the viral levels were measured varied from participant to participant. A substantial number of participants didn’t return for follow-up appointments. The statistical analysis was jejune, and when redone by a more competent statistician, showed very little effect of the drug. The journal’s editor-in-chief was also one of the co-authors of the paper – and although he handed the editorial duties to an independent editor, it’s still easy to imagine the subtle pressure to give the article an easier ride. Perhaps acknowledging how weak their initial research was, the same researchers then ran a second study – but this time failed to even include a control group.

Nevertheless, the initial French studies—and their publicity from the US President—were followed by an avalanche of trials of hydroxychloroquine across the world. Even as early as April, over 200 trials of the drug had been registered. This highlighted another dismaying aspect of modern science: waste. Although many fields are feeling the benefits of “team science”, where researchers across many universities and countries collaborate to do better research than they could do alone, the incentives still point more strongly towards getting your own paper out faster than your “rivals”. We thus ended up with a messy patchwork of trials of hydroxychloroquine and other candidate drugs, many of which are still ongoing, and which vary wildly in quality. The resources expended on running so many often redundant trials could have been better spent elsewhere; they’re also a time-bomb for future misunderstanding and misinterpretation (by the way, where researchers did collaborate on strong, definitive trials, hydroxychloroquine showed no benefit for COVID survival).

But the frustrating story of hydroxychloroquine doesn’t end there. In late May, The Lancet, one of the world’s most prestigious medical journals, published a major paper with some seriously scary results: hydroxychloroquine was actually killing COVID patients. A small-scale medical company called Surgisphere had provided a database of information on 96,000 patients to academics at the Harvard-affiliated Brigham and Women’s Hospital. Surgisphere did the analysis, and the Harvard academics wrote the article up for its eventual publication. The results showed that hydroxychloroquine, far from being linked to faster COVID recovery, was actually associated with an increased probability of death in hospital – and the effects looked substantial. Surgisphere’s owner, Sapan Desai, said in an interview that because it was so big, his study was better than any randomised controlled trial.

But upon reading the papers, the scientific community raised a collective eyebrow. The effect sizes—the negative impacts of hydroxychloroquine on survival and on heart arrhythmias—were far larger than we’d expect for a drug that, after all, is used routinely for malaria. And anyway, how had Surgisphere actually obtained all these data? A Guardian investigation found that the company had merely six listed employees, one of whom, bizarrely, was not a researcher or analyst but “an adult model and events hostess”.

Some of the data just couldn’t be right: for example, there were six more COVID deaths in Surgisphere’s database for Australia than had actually been reported for that country at the time. When the Harvard researchers asked Surgisphere for the raw data—after the study had been published—none were forthcoming. They had no choice but to retract the paper. A second paper on the Surgisphere data from the same authors was also retracted from the New England Journal of Medicine – widely considered the top medical journal in the world.

Were the Surgisphere data mistaken in some way? Adjusted? Altered? Made up from scratch? Months later, it’s still unclear (though unexplained irregularities in an older paper by Desai don’t inspire confidence). Whatever their provenance, they weren’t true – but they certainly had an immediate impact. Not only did the World Health Organisation suspend its trials of hydroxychloroquine, but the retraction handed a full clip of ammunition to the advocates of the drug. Could these researchers, they asked, have fiddled the numbers—or at least, been oblivious to the obvious defects in the data—because of their strong bias towards showing Donald Trump in a negative light? Sadly, it’s impossible to definitively answer “no”.

Here, on a question that instantly and directly affected the lives of COVID patients, our scientific system failed us once again. Even with its history of notorious retractions, The Lancet was taken in yet again by an eye-catching result that crumbled like a fistful of sand when examined in any depth (and frustratingly, the damage continues: over a hundred subsequent papers have cited the retracted studies as if they were credible).

An editorial published in The Lancet after the Surgisphere scandal promised that lessons would be learned. In future, the editors wrote, more than one member of the authorship team of a paper will be required to declare they’ve seen the data. And in cases where a study involves a large-scale dataset, “reviews from an expert in data science [will be] obtained”. If it seems astonishing that one of the world’s best medical journals only decided in September 2020 to ensure that its peer-reviewers had expertise in the type of study they’re reviewing, it’s because it is.

So what are we to make of the last year in pandemic science? To understand it requires a kind of doublethink: vaccine research, big-data analysis, structural biology, and many other areas made impressive advances, while much of the rest of science tripped over the same old problems that have troubled it for decades.

Having found that the previous literature wasn’t as helpful as it should’ve been, many scientists succumbed to their usual habit of publishing as much new research as possible, mostly regardless of quality – and the journals’ barriers for entry were even lower than usual. Low-calibre, bias-ridden research papers became weapons in political fights over our policies on the virus. Bad actors took advantage of scientists’ trusting natures to push dangerously misleading claims. In so many cases, our scientific systems and institutions failed in their cardinal role: sorting good research from bad.

As I said right at the beginning: none of these problems are new. We’ve always needed that doublethink – to appreciate amazing new discoveries but attempt to improve the faulty system in which they’re made. So how do we improve it?

First, it might be of interest to note the contrast between the successful vaccines, developed in a largely industrial, profit-driven, highly-regulated environment, and the reams of low-quality research papers emanating from universities. Perhaps the much-maligned Big Pharma could teach a thing or two to academics who, focused on getting that next journal publication, have lost sight of the real-world effects of their research – or the lack of them. There’s something to this argument, but it’s not quite that simple: several of the vaccines come from academic-industrial partnerships, and in any case, some vaccine research (most notably the original, fumbled AstraZeneca trial) has still been polluted by questionable analyses and data.

Both academic and industrial research could benefit from the solutions set out in a paper from October, which took the perspective of Open Science. It suggested we should require scientists to publicly pre-register their analyses for all studies, not just clinical trials - and where possible to have those pre-registrations reviewed. We should make it the norm (indeed, mandatory) to share data and materials, and have journals make the whole publication process, including peer-reviews, fully transparent. Open, easy-to-use databases for all COVID-19 research plans and data would help scientists collaborate, dissuading them from running extraneous studies and encouraging more discussion of the pros and cons of any individual analysis.

At the broader, cultural level, we should do all we can to break the obsession with prestige and publication, and re-orient our academic system towards rewarding what really matters: producing research with the rigour and intellectual humility that science demands. That means changing policy at the level of journals, research funders—who want to fund research that matters, and could easily demand higher standards before parting with their money—and universities, who could institute changes to job selection, promotion, and tenure systems that reward collaboration, openness, robust research, and good scientific citizenship rather than CVs bursting with “high-impact” papers.

Changes can happen from the bottom up as well as the top down. Individual scientists are catching on to the need for better research, even from a selfish standpoint - nobody wants to be associated with the kinds of research screwups I described above. Unfortunately, COVID-19 seems to have given many of them an excuse to put off improving their research - after all, isn’t it important to just get the results out there? On the contrary, we should take a stand against what’s been dubbed “pandemic research exceptionalism” – the idea that there are circumstances so dire that we’re justified in dialling down our scepticism. The opposite is true: higher standards and keener reviews would at the very least have averted disasters like the Surgisphere scandal, and are even more necessary when our research could directly affect the progress of a pandemic.

All these potential fixes are far easier said than done. But the useless, misleading, and nonsensical science produced during the COVID-19 pandemic shows us the high stakes of getting it wrong. The truth is a fragile thing: it’s easily lost or distorted when we get distracted by other goals – and in a pandemic, the cost of being further from the truth is measured in human lives. There are few more urgent tasks than correcting the problems that bedevil our scientific system – both for this pandemic, and for the next one.

Alongside all the successes of science in the Covid era, the pandemic has also sparked an outbreak of viral misinformation and sloppy research, revealing the glaring flaws in our scientific system.