EducationFeatured

Max H. Bazerman’s New Book Tells the True Story of the Dishonest Honesty Study


Inside an Academic Scandal: A Story of Fraud and Betrayal, by Max H. Bazerman (MIT Press, 200 pp., $32.95)

Some social-science findings are just plain fun. They instantly lend themselves to media coverage and perhaps a TED Talk. Harvard business professor Max H. Bazerman made such a finding back in 2012: people behaved more honestly, he and four coauthors reported, if they signed a statement promising to be honest beforehand.

This finding was potentially useful as well as fun. High-stakes forms, such as tax returns, generally have the “I promise this is true” language at the end of the document, not the beginning. The “signing first” concept could save a lot of money by reducing dishonesty.

Bazerman’s study was rigorous, too, combining multiple randomized experiments. In one, those reporting their odometer readings to an insurance company—to measure how much they had driven since the last reading, which affected their premiums—were randomly assigned to attest their honesty either before or after giving the number. Those promising at the beginning admitted to driving more. Two other experiments, run in a psychology lab, tested whether participants cheated less if they had promised to behave honestly beforehand.

Some companies changed their processes in response to the findings. Just one problem: the effect was never real. The data analyzed for at least two and perhaps all three experiments had been doctored. Shockingly, it seems that two separate processes, involving different people, contributed to the fakery—in a study about honesty.

In his new book, Inside an Academic Scandal, Bazerman explains what went down, how he got mixed up in it, and how science might prevent such incidents in the future.

The paper began with the two lab experiments, which paid subjects based on how many math puzzles they could solve and gave them the opportunity to lie about the number. These experiments were conducted at the University of North Carolina (UNC), where Francesca Gino, a colleague of Bazerman’s, had worked before joining him at Harvard. A Harvard Ph.D. student helped Gino and Bazerman write the article, but the paper struggled to find a home in an academic journal.

They decided to buttress their lab work with a field experiment. They happened to know that Dan Ariely of Duke University had been discussing but had never formally published data from an insurance company that had tested the signing-first theory. Ariely joined as a coauthor as well, bringing yet another colleague with him.

As the scientists brought the three experiments together into a more compelling write-up, some tension emerged. For one thing, Bazerman found it odd that the insurance company’s customers had driven an average of 24,000 miles in a single year (13,000–15,000 miles is a more typical annual American average). A coauthor eventually explained that, while the team initially thought the odometer readings were measuring 12 months of driving (and indeed had said as much in early versions of the paper), the mileage readings may have reflected more driving time than that. He accepted this explanation without suspecting a deeper problem.

The paper was published to great fanfare and media attention. Yet Bazerman’s own follow-up work soon led to its downfall.

A true believer in the signing-first idea (he hadn’t worked directly with the raw data from the first paper), Bazerman launched a project with two other colleagues to test the idea online, as opposed to in-person. Their experiments repeatedly failed.

Eventually, they felt compelled to rerun one of the lab experiments from the 2012 paper as precisely as possible but with a much bigger sample size—to verify that the concept at least worked in the original setting. To make the project less adversarial, they reached out to Bazerman’s original coauthors, who agreed to sign on.

This experiment also failed. And in the process of writing up the results, one of the new coauthors reviewed the data from the original study, noticing something else weird about the insurance data.

The idea behind the experiment was to randomly assign some customers to sign first and then to see whether they reported more mileage since their last reading than the control group did. But the two groups didn’t just differ in that second reading, taken during the experiment—they also differed in their previous, baseline odometer readings, to an extent extremely unlikely to happen by chance. Apparently, the two groups had not actually been assigned at random.

Bazerman and his colleagues reported this problem, along with their repeated failure to replicate the signing-first effect, in a 2020 follow-up. Bazerman was inclined to retract the original paper entirely at this point, but he lost a vote taken among his coauthors. Along with the new study, the team published their full data for both papers.

Enter Data Colada, a blog run by a trio of data sleuths.

In 2021, acting on a tip from other researchers who remained anonymous, Data Colada published a compelling argument that the insurance experiment contained fake data. The team highlighted a number of red flags, but the most damning stemmed from the same miles-driven variable that had triggered Bazerman’s earlier objection.

If you asked a bunch of car owners how many miles they had driven in a certain period of time, you would expect to find something like a bell curve: lots of people with numbers in the ballpark of the average, and relatively few people with extreme situations, such as driving only 20 or as many as 60,000 miles in a year. (A similar data set from the U.K. indeed looked like this, Data Colada pointed out.) But in the 2012 study’s insurance data, the numbers were evenly distributed between zero and 50,000 miles. Cars were equally likely to have added, say, 500, 10,000, or 45,000 miles between the readings, but then the numbers abruptly stopped at 50,000.

That’s something that happens when you use a random number generator (and use it hackishly at that), not what you would expect when collecting real-world data.

Ariely, who had provided the problematic file to the rest of the team, said it must have been doctored before he received it. The insurance company, after reviewing the issue, insisted that it had provided a “small, single set of raw data” that contained far fewer vehicles than the final data set used in the study. The company said the study’s file seemed to mix the real data with “synthesized or fabricated” numbers in a different font, and that the real data, by themselves, did not support the signing-first effect. Ariely remains a professor at Duke.

The Data Colada team also had issues with one of the UNC lab experiments but held off on publishing a post about it for two years while Harvard conducted its own investigation. Here, the telltale sign was that the data were sorted by the values in two columns—except for a handful of observations. These out-of-place rows just so happened to be extremely skewed in support of the hypothesis, with those who signed first behaving more honestly and vice versa.

In a footnote, the Colada team expressed doubts about the paper’s other lab experiment, as well. In a series of follow-up posts, they questioned a few other studies Gino, the Harvard professor, had been involved with.

These allegations had much more serious consequences. Harvard’s investigation resulted in a nearly 1,300-page report concluding, by a “preponderance of the evidence,” that Gino had “committed research misconduct intentionally, knowingly, or recklessly.” In an extremely rare move, the school stripped Gino of her tenure and fired her.

Gino sued Harvard and the Data Colada team. The claims against the bloggers were dismissed. In August, Harvard filed a counterclaim in its suit, alleging that a file that Gino had provided to defend herself had also been tampered with. Gino maintains her innocence and has a backer in Lawrence Lessig, a prominent Harvard professor.

However it happened, the research was a “clusterfake,” in Data Colada’s words: two different data-tampering incidents, involving different people, in the same study, about honesty.

Inside an Academic Scandal does an excellent job of explaining the facts from Bazerman’s perspective. We can take his viewpoint as reasonably objective, since there’s no evidence he was involved in any of the tampering, and since he’s donating his advances and royalties from the book to the Scientific Integrity Fund.

Readers of his book come to understand that, while a study may have several or even many authors, core aspects of data collection and analysis may flow through a single person, allowing him to take liberties undetected. It’s hard to tell if someone changed a dataset before analyzing it or passing it along to a colleague.

Even in this case, if the study hadn’t been so prominent, if Bazerman hadn’t done follow-up research, if the raw data hadn’t been made public, if anonymous researchers hadn’t dug into the numbers and tipped off Data Colada, or if the manipulations had been performed at higher than a tenth-grader’s level of sophistication, the problems might never have come to light. Given all these circumstances, one wonders how much scientific fraud goes undiscovered.

When fraud is discovered, of course, the punishment falls not just on the person who commits it but on everyone who has worked with him, not to mention everyone who relied on the findings or has faith in the scientific process.

Why would someone risk their career to falsify research? No one has confessed in this case, but Bazerman spends some time discussing Diederik Stapel, a Dutch psychology researcher who faked data, admitted it when caught, talked to the media about his methods, and wrote a book about the affair.

Stapel’s experience evokes a wealthy compulsive shoplifter. He started out small, typing new numbers into the data for an experiment that hadn’t panned out, after which he managed to publish the paper. Over time, he progressed to fabricating entire studies. (It can be a red flag for an established researcher to insist on collecting and managing his own data, Bazerman notes, since such mundane tasks traditionally go to lower-level academic personnel.)

Bazerman’s thoughts on preventing future fraud, though welcome given his firsthand experience combined with his academic expertise on honesty, aren’t particularly novel. He praises the movement toward “open science,” where researchers post their data and code publicly; encourages researchers to investigate diligently anything that “seems off” in their studies (such as the high mileage that Bazerman noticed); supports greater efforts to replicate existing studies; and urges universities to promote best practices and be more transparent when investigating fraud.

Yes to all of that. But is it enough to restore faith in scientific research in today’s politicized climate? I suspect not—and if you give me a few minutes, I can send you a spreadsheet proving it.

Photo: NuPenDekDee / iStock / Getty Images Plus


Source link

Related Posts

1 of 128