Help. My AI scientist just plagiarised my research

Help. My AI scientist just plagiarised my research
Photo by National Cancer Institute / Unsplash

A BBC News headline caught my eye a couple of days ago : "AI cracks superbug problem in two days that took scientists years". And by "caught my eye", I mean "made me doubt that this is true".

The general gist of the story is that a newly launched Google AI product called co-scientist was tasked with determining a hypothesis for why some superbugs are immune to antibiotics. It is reported that the AI came up with the same hypothesis as the human scientists except it did it in two days whereas they took ten years.

You might already notice some editorialisation has happened in the headline. The AI didn't "crack" a superbug problem, it only came up with a hypothesis of what might be going on. This is acknowledged eight paragraphs into the story : "The full decade spent by the scientists also includes the time it took to prove the research, which itself was multiple years."

According to the story, the AI didn't have access to any of the human scientist's unpublished research on the topic and neither did Google have access to anything on his computer. Obviously this is supposed to assuage our concerns about whether the model has in some way "cheated" by looking at the answers in advance.

Now, I'm not a microbiologist nor am I a doctoral-level expert on AI but I do know enough about how research, marketing and hype work to make my own hypotheses about how this AI research played out.

The first one is sometimes referred to as survivorship bias or the Texas Sharpshooter fallacy. Essentially that says that if you do an experiment enough times you'll get the answer you want by chance at least once. Then you focus purely on that one apparently confirmatory result and pretend the other unsuccessful ones never happened. You see this a lot with products that claim medical benefits. The manufacturer proudly touts their scientific evidence that proves it works but that's the only result they publish. You don't see the dozens of trials where the product was no better, or even worse, than a placebo (see Ben Goldacre's Bad Science and Bad Pharma books* , for example)

You also see this a lot in viral videos where someone throws a basketball at a hoop behind them and gets in straight through. Before that shot you can be pretty sure there were hundreds where they missed.

My second hypothesis is that the AI wasn't starting from the same point that the scientists were in 2015. Google's paper isn't clear on what data went in to building their model but they say it is based on Gemini 2.0, the current version of which contains data available up to June 2024.

Is it likely that the human researchers' breakthrough hypothesis appeared fully formed in their minds some time between June 2024 and February 2025 with no preliminary work having been published beforehand? Seems unlikely.

Scientific research always builds on what came before it. This is the research paper I believe the BBC article is referring to in which the hypothesis was published in August 2024. It has references to 46 other papers. I counted seventeen of them being published in or after 2015. That suggests that the AI had access to about 50% more relevant research than the human scientists had when they started out.

Essentially what I'm saying here is that the playing field wasn't level and so AI had a much easier task kicking downhill toward the goal. It didn't have to come up with a novel concept or conduct any additional research. All it had to do was see in which direction the science was moving and generate a hypothesis based on that.

There's a lot going on in the AI space at the moment. Journalists want to write about it. Readers want to read about it. Big tech companies want to hype their products. That's a match made in heaven. Big claims in headlines with the detail buried not even later in the article but hidden away where only people who actually have some prior knowledge of the topic would go looking for it. But that's not a problem to the Google's of the world. They just want everyone to think that they're one step away from the Singularity.

If you're going to take one thing away from this article let it be this: "AI" as it currently discussed in a mathematical process that takes in vast amounts of information at one end and spits out something that probabilistically looks like it belongs in that set of data. What it doesn't do is come up with anything new.

PS Just as I was about to press publish on this article, a Mastodon post pointed to me to this YouTube video which tackles the same subject and provides a really in-depth discussion.

* As an Amazon Associate I earn from qualifying purchases.

Mastodon