The high-tech war on science fraud
The Long Read: The problem of fake data may go far deeper than scientists admit. Now a team of researchers has a controversial plan to root out the perpetrators
One morning last summer, a German psychologist named Mathias Kauff woke up to find that he had been reprimanded by a robot. In an email, a computer program named Statcheck informed him that a 2013 paper he had published on multiculturalism and prejudice appeared to contain a number of incorrect calculations which the program had catalogued and then posted on the internet for anyone to see. The problems turned out to be minor just a few rounding errors but the experience left Kauff feeling rattled. At first I was a bit frightened, he said. I felt a bit exposed.
Kauff wasnt alone. Statcheck had read some 50,000 published psychology papers and checked the maths behind every statistical result it encountered. In the space of 24 hours, virtually every academic active in the field in the past two decades had received an email from the program, informing them that their work had been reviewed. Nothing like this had ever been seen before: a massive, open, retroactive evaluation of scientific literature, conducted entirely by computer.
Statchecks method was relatively simple, more like the mathematical equivalent of a spellchecker than a thoughtful review, but some scientists saw it as a new form of scrutiny and suspicion, portending a future in which the objective authority of peer review would be undermined by unaccountable and uncredentialed critics.
Susan Fiske, the former head of the Association for Psychological Science, wrote an op-ed accusing self-appointed data police of pioneering a new form of harassment. The German Psychological Society issued a statement condemning the unauthorised use of Statcheck. The intensity of the reaction suggested that many were afraid that the program was not just attributing mere statistical errors, but some impropriety, to the scientists.
The man behind all this controversy was a 25-year-old Dutch scientist named Chris Hartgerink, based at Tilburg Universitys Meta-Research Center, which studies bias and error in science. Statcheck was the brainchild of Hartgerinks colleague Michle Nuijten, who had used the program to conduct a 2015 study that demonstrated that about half of all papers in psychology journals contained a statistical error. Nuijtens study was written up in Nature as a valuable contribution to the growing literature acknowledging bias and error in science but she had not published an inventory of the specific errors it had detected, or the authors who had committed them. The real flashpoint came months later,when Hartgerink modified Statcheck with some code of his own devising, which catalogued the individual errors and posted them online sparking uproar across the scientific community.
Hartgerink is one of only a handful of researchers in the world who work full-time on the problem of scientific fraud and he is perfectly happy to upset his peers. The scientific system as we know it is pretty screwed up, he told me last autumn. Sitting in the offices of the Meta-Research Center, which look out on to Tilburgs grey, mid-century campus, he added: Ive known for years that I want to help improve it. Hartgerink approaches his work with a professorial seriousness his office is bare, except for a pile of statistics textbooks and an equation-filled whiteboard and he is appealingly earnest about his aims. His conversations tend to rapidly ascend to great heights, as if they were balloons released from his hands the simplest things soon become grand questions of ethics, or privacy, or the future of science.
Statcheck is a good example of what is now possible, he said. The top priority,for Hartgerink, is something much more grave than correcting simple statistical miscalculations. He is now proposing to deploy a similar program that will uncover fake or manipulated results which he believes are far more prevalent than most scientists would like to admit.
When it comes to fraud or in the more neutral terms he prefers, scientific misconduct Hartgerink is aware that he is venturing into sensitive territory. It is not something people enjoy talking about, he told me, with a weary grin. Despite its professed commitment to self-correction, science is a discipline that relies mainly on a culture of mutual trust and good faith to stay clean. Talking about its faults can feel like a kind of heresy. In 1981, when a young Al Gore led a congressional inquiry into a spate of recent cases of scientific fraud in biomedicine, the historian Daniel Kevles observed that for Gore and for many others, fraud in the biomedical sciences was akin to pederasty among priests.
The comparison is apt. The exposure of fraud directly threatens the special claim science has on truth, which relies on the belief that its methods are purely rational and objective. As the congressmen warned scientists during the hearings, each and every case of fraud serves to undermine the publics trust in the research enterprise of our nation.
But three decades later, scientists still have only the most crude estimates of how much fraud actually exists. The current accepted standard is a 2009 study by the Stanford researcher Daniele Fanelli that collated the results of 21 previous surveys given to scientists in various fields about research misconduct. The studies, which depended entirely on scientists honestly reporting their own misconduct, concluded that about 2% of scientists had falsified data at some point in their career.
If Fanellis estimate is correct, it seems likely that thousands of scientists are getting away with misconduct each year. Fraud including outright fabrication, plagiarism and self-plagiarism accounts for the majority of retracted scientific articles. But, according to RetractionWatch, which catalogues papers that have been withdrawn from the scientific literature, only 684 were retracted in 2015, while more than 800,000 new papers were published. If even just a few of the suggested 2% of scientific fraudsters which, relying on self-reporting, is itself probably a conservative estimate are active in any given year, the vast majority are going totally undetected. Reviewers and editors, other gatekeepers theyre not looking for potential problems, Hartgerink said.
But if none of the traditional authorities in science are going to address the problem, Hartgerink believes that there is another way. If a program similar to Statcheck can be trained to detect the traces of manipulated data, and then make those results public, the scientific community can decide for itself whether a given study should still be regarded as trustworthy.
Hartgerinks university, which sits at the western edge of Tilburg, a small, quiet city in the southern Netherlands, seems an unlikely place to try and correct this hole in the scientific process. The university is best known for its economics and business courses and does not have traditional lab facilities. But Tilburg was also the site of one of the biggest scientific scandals in living memory and no one knows better than Hartgerink and his colleagues just how devastating individual cases of fraud can be.
In September 2010, the School of Social and Behavioral Science at Tilburg University appointed Diederik Stapel, a promising young social psychologist, as its new dean. Stapel was already popular with students for his warm manner, and with the faculty for his easy command of scientific literature and his enthusiasm for collaboration. He would often offer to help his colleagues, and sometimes even his students, by conducting surveys and gathering data for them.
As dean, Stapel appeared to reward his colleagues faith in him almost immediately. In April 2011 he published a paper in Science, the first study the small university had ever landed in that prestigious journal. Stapels research focused on what psychologists call priming: the idea that small stimuli can affect our behaviour in unnoticed but significant ways. Could being discriminated against depend on such seemingly trivial matters as garbage on the streets? Stapels paper in Science asked. He proceeded to show that white commuters at the Utrecht railway station tended to sit further away from visible minorities when the station was dirty. Similarly, Stapel found that white people were more likely to give negative answers on a quiz about minorities if they were interviewed on a dirty street, rather than a clean one.
Stapel had a knack for devising and executing such clever studies, cutting through messy problems to extract clean data. Since becoming a professor a decade earlier, he had published more than 100 papers, showing, among other things, that beauty product advertisements, regardless of context, prompted women to think about themselves more negatively, and that judges who had been primed to think about concepts of impartial justice were less likely to make racially motivated decisions.
His findings regularly reached the public through the media. The idea that huge, intractable social issues such as sexism and racism could be affected in such simple ways had a powerful intuitive appeal, and hinted at the possibility of equally simple, elegant solutions. If anything united Stapels diverse interests, it was this Gladwellian bent. His studies were often featured in the popular press, including the Los Angeles Times and New York Times, and he was a regular guest on Dutch television programmes.
But as Stapels reputation skyrocketed, a small group of colleagues and students began to view him with suspicion. It was too good to be true, a professor who was working at Tilburg at the time told me. (The professor, who I will call Joseph Robin, asked to remain anonymous so that he could frankly discuss his role in exposing Stapel.) All of his experiments worked. That just doesnt happen.
A student of Stapels had mentioned to Robin in 2010 that some of Stapels data looked strange, so that autumn, shortly after Stapel was made Dean, Robin proposed a collaboration with him, hoping to see his methods first-hand. Stapel agreed, and the data he returned a few months later, according to Robin, looked crazy. It was internally inconsistent in weird ways; completely unlike any real data I had ever seen. Meanwhile, as the student helped get hold of more datasets from Stapels former students and collaborators, the evidence mounted: more weird data, and identical sets of numbers copied directly from one study to another.
In August 2011, the whistleblowers took their findings to the head of the department, Marcel Zeelenberg, who confronted Stapel with the evidence. At first, Stapel denied the charges, but just days later he admitted what his accusers suspected: he had never interviewed any commuters at the railway station, no women had been shown beauty advertisements and no judges had been surveyed about impartial justice and racism.
Stapel hadnt just tinkered with numbers, he had made most of them up entirely, producing entire datasets at home in his kitchen after his wife and children had gone to bed. His method was an inversion of the proper scientific method: he started by deciding what result he wanted and then worked backwards, filling out the individual data points he was supposed to be collecting.
On 7 September 2011, the university revealed that Stapel had been suspended. The media initially speculated that there might have been an issue with his latest study announced just days earlier, showing that meat-eaters were more selfish and less sociable but the problem went much deeper. Stapels students and colleagues were about to learn that his enviable skill with data was, in fact, a sham, and his golden reputation, as well as nearly a decade of results that they had used in their own work, were built on lies.
Chris Hartgerink was studying late at the library when he heard the news. The extent of Stapels fraud wasnt clear by then, but it was big. Hartgerink, who was then an undergraduate in the Tilburg psychology programme, felt a sudden disorientation, a sense that something solid and integral had been lost. Stapel had been a mentor to him, hiring him as a research assistant and giving him constant encouragement. This is a guy who inspired me to actually become enthusiastic about research, Hartgerink told me. When that reason drops out, what remains, you know?
Hartgerink wasnt alone; the whole university was stunned. It was a really difficult time, said one student who had helped expose Stapel. You saw these people on a daily basis who were so proud of their work, and you know its just based on a lie. Even after Stapel resigned, the media coverage was relentless. Reporters roamed the campus first from the Dutch press, and then, as the story got bigger, from all over the world.
On 9 September, just two days after Stapel was suspended, the university convened an ad-hoc investigative committee of current and former faculty. To help determine the true extent of Stapels fraud, the committee turned to Marcel van Assen, a statistician and psychologist in the department. At the time, Van Assen was growing bored with his current research, and the idea of investigating the former dean sounded like fun to him. Van Assen had never much liked Stapel, believing that he relied more on the force of his personality than reason when running the department. Some people believe him charismatic, Van Assen told me. I am less sensitive to it.
Van Assen who is 44, tall and rangy, with a mop of greying, curly hair approaches his work with relentless, unsentimental practicality. When speaking, he maintains an amused, half-smile, as if he is joking. He once told me that to fix the problems in psychology, it might be simpler to toss out 150 years of research and start again; Im still not sure whether or not he was serious.
To prove misconduct, Van Assen said, you must be a pitbull: biting deeper and deeper, clamping down not just on the papers, but the datasets behind them, the research methods, the collaborators using everything available to bring down the target. He spent a year breaking down the 45 studies Stapel produced at Tilburg and cataloguing their individual aberrations, noting where the effect size a standard measure of the difference between the two groups in an experiment seemed suspiciously large, where sequences of numbers were copied, where variables were too closely related, or where variables that should have moved in tandem instead appeared adrift.
The committee released its final report in October 2012 and, based largely on its conclusions, 55 of Stapels publications were officially retracted by the journals that had published them. Stapel also returned his PhD to the University of Amsterdam. He is, by any measure, one of the biggest scientific frauds of all time. (RetractionWatch has him third on their all-time retraction leaderboard.) The committee also had harsh words for Stapels colleagues, concluding that from the bottom to the top, there was a general neglect of fundamental scientific standards. It was a real blow to the faculty, Jacques Hagenaars, a former professor of methodology at Tilburg, who served on the committee, told me.
By extending some of the blame to the methods and attitudes of the scientists around Stapel, the committee situated the case within a larger problem that was attracting attention at the time, which has come to be known as the replication crisis. For the past decade, the scientific community has been grappling with the discovery that many published results cannot be reproduced independently by other scientists in spite of the traditional safeguards of publishing and peer-review because the original studies were marred by some combination of unchecked bias and human error.
After the committee disbanded, Van Assen found himself fascinated by the way science is susceptible to error, bias, and outright fraud. Investigating Stapel had been exciting, and he had no interest in returning to his old work. Van Assen had also found a like mind, a new professor at Tilburg named Jelte Wicherts, who had a long history working on bias in science and who shared his attitude of upbeat cynicism about the problems in their field. We simply agree, there are findings out there that cannot be trusted, Van Assen said. They began planning a new sort of research group: one that would investigate the very practice of science.
Read more: http://www.theguardian.com/us