
Artificial intelligence can do many of the things that social scientists do. It can analyze data, write and review code, identify appropriate statistical methods, and offer suggestions on study drafts. It can even take a dataset and a research question and produce an entire paper on its own. Given that human-led social science is often marred by mistakes, dubious methods, ideological bias, and even outright fraud, one can hope that AI will improve the field in the years ahead.
Some recent studies, though, highlight the limitations of current models. For now, AI is a productivity- and quality-enhancing tool, but not a panacea for what ails social science, nor a reason to let one’s guard down.
Finally, a reason to check your email.
Sign up for our free newsletter today.
Perhaps the biggest social-science story of the week centers on a 2021 study of tech clusters—cities with large numbers of inventors working in a given field—and their impact on innovation. A charity found this study helpful in its funding decisions and hired economist Michael Wiebe to extend it. But when Wiebe dug into the materials, he uncovered a series of technical and coding errors that, once corrected, undermined the study’s conclusions. The American Economic Review (which also published the original study) has accepted Wiebe’s comment detailing these issues.
As it happened, Wiebe also tried feeding the materials into AI chatbots—two versions of ChatGPT and also Refine, a special tool calibrated to improving academic work—asking them to scrutinize each key result to see if they’d detect these problems as well.
On the one hand, the bots were helpful. They caught several issues, including a key coding error. Social scientists should run their papers and code through AI and investigate any problems it flags; doing so takes very little time. On the other hand, the AIs missed many problems. And Wiebe notes that he did not test for false positives—cases where the AI identified a problem that didn’t exist. Another new study finds that AI “editing” of text often distorts the meaning. Personally, I can attest to having been falsely accused of a coding error by ChatGPT.
The lesson here: AI “peer review” can improve papers, yet you can’t necessarily trust a paper that’s been AI-vetted.
But what if we take the human-written paper and code out of the equation entirely, and simply give AIs the data and a research question? If they can consistently identify the best methods, apply them, and reach the correct conclusions, that would be a huge advantage. After all, human research teams can do markedly different things even when applying the same data to the same question, a reality that stems from both ideological bias and differences in methodological choices.
Unfortunately, a new study finds the same problem in AIs. Working with 150 Claude Code agents from the Sonnet 4.6 and Opus 4.6 families, researchers provided New York Stock Exchange data and asked them to answer questions, such as whether daily trading volume, intraday volatility, and the price impact of trades changed over time.
Some of these questions produced little variation in results. But others produced huge differences, driven by subtle choices. For example, “trading volume” can be interpreted as dollar volume or share volume, which produced results in the opposite direction. Similarly, the change in volatility depends heavily on whether raw or proportional changes are measured. Strikingly, different versions of Claude even displayed distinct “empirical styles,” favoring particular modeling approaches and ways of measuring variables (such as daily versus monthly).
Offering the AIs a peer review from another AI prompted some revisions, but did not lead to convergence on similar results. When researchers instead provided examples of top papers on similar questions, the AIs often imitated those methods and converged.
In other words, like humans, AIs will branch out and do things differently, unless nudged onto the same path. That can be useful if you already know the right path, but it’s a clear limitation if you want reliable results without extensive human steering. After all, it’s precisely human fallibility and bias that make AI appealing in the first place.
And speaking of human bias, yet more new research, by the Manhattan Institute’s Jim Manzi, traces the ideological orientation of academic work since 1960 (using AI, naturally, to classify articles’ political valence). It finds that “roughly 90 percent of politically relevant social-science articles leaned left” over this period; that every discipline leaned left on average; and that each has moved further left since 1990.
Such findings underscore the need for social science to draw on a wider range of perspectives, and, with careful prompting, AI might help serve that purpose. Yet they also point to a constraint: AI is trained on the existing body of human writing, including biased research, to begin with. As Manhattan Institute reports from David Rozado have demonstrated, AI models often have left-leaning ideological priors and exhibit other biases as well, such as tending to pick the first of two options given.
AI can spot errors in human work and generate passable code and prose at remarkable speed, advantages we should not understate. But for now, the technology still makes frequent mistakes, carries its own ideological baggage, and fails to converge on consistent results when different models tackle the same question—unless heavily steered by the very humans whose foibles we hope to escape.
Photo: Wasan Tita / iStock / Getty Images Plus
City Journal is a publication of the Manhattan Institute for Policy Research (MI), a leading free-market think tank. Are you interested in supporting the magazine? As a 501(c)(3) nonprofit, donations in support of MI and City Journal are fully tax-deductible as provided by law (EIN #13-2912529).
Source link















