OpenAI just admitted it can't identify AI-generated text. That's bad for the internet and it could be really bad for AI models.

L4sBot@lemmy.world · 1 year ago

OpenAI just admitted it can't identify AI-generated text. That's bad for the internet and it could be really bad for AI models.

vrighter@discuss.tchncs.de · 1 year ago

They already do. where do you think the training corpus comes from? The real world. It’s curated by humans and then fed to the ml system.

Problem is that the real world now has a bunch of text generated by ai. And it has been well studied that feeding that back into the training will destroy your model (because the networks would then effectively be trained to predict their own output, which just doesn’t make sense)

So humans still need to filter that stuff out of the training corpus. But we can’t detect which ones are real and which ones are fake. And neither can a machine. So there’s no way to do this properly.

The data almost always comes from the real world, except now the real world also contains “harmful” (to ai) data that we can’t figure out how to find and remove.