Why is no one talking about how unproductive it is to have verify every "hallucination" ChatGPT gives you?

phoneymouse@lemmy.world · 1 month ago

Why is no one talking about how unproductive it is to have verify every "hallucination" ChatGPT gives you?

antonim@lemmy.dbzer0.com · 1 month ago

to fool into errors

tricking a kid

I’ve never tried to fool or trick AI with excessively complex questions. When I tried to test it (a few different models over some period of time - ChatGPT, Bing AI, Gemini) I asked stuff as simple as “what’s the etymology of this word in that language”, “what is [some phenomenon]”. The models still produced responses ranging from shoddy to absolutely ridiculous.

completely detached from how anyone actually uses

I’ve seen numerous people use it the same way I tested it, basically a Google search that you can talk with, with similarly shit results.

archomrade [he/him]@midwest.social · 1 month ago

Why do we expect a higher degree of trustworthiness from a novel LLM than we de from any given source or forum comment on the internet?

At what point do we stop hand-wringing over llms failing to meet some perceived level of accuracy and hold the people using it responsible for verifying the response themselves?

Theres a giant disclaimer on every one of these models that responses may contain errors or hallucinations, at this point I think it’s fair to blame the user for ignoring those warnings and not the models for not meeting some arbitrary standard.

antonim@lemmy.dbzer0.com · edit-2 1 month ago

Why do we expect a higher degree of trustworthiness from a novel LLM than we de from any given source or forum comment on the internet?

The stuff I’ve seen AI produce has sometimes been more wrong than anything a human could produce. And even if a human would produce it and post it on a forum, anyone with half a brain could respond with a correction. (E.g. claiming that an ordinary Slavic word is actually loaned from Latin.)

I certainly don’t expect any trustworthiness from LLMs, the problem is that people do expect it. You’re implicitly agreeing with my argument that it is not just that LLMs give problematic responses when tricked, but also when used as intended, as knowledgeable chatbots. There’s nothing “detached from actual usage” about that.

At what point do we stop hand-wringing over llms failing to meet some perceived level of accuracy and hold the people using it responsible for verifying the response themselves?

at this point I think it’s fair to blame the user for ignoring those warnings and not the models for not meeting some arbitrary standard

This is not an either-or situation, it doesn’t have to be formulated like this. Criticising LLMs which frequently produce garbage is in practice also directed at people who do use them. When someone on a forum says they asked GPT and paste its response, I will at the very least point out the general unreliability of LLMs, if not criticise the response itself (very easy if I’m somewhat knowledgeable about the field in question). This is practically also directed at the person who posted that, such as e.g. making them come off as naive and uncritical. (It is of course not meant as a real personal attack, but even a detached and objective criticism has a partly personal element to it.)

Still, the blame is on both. You claim that:

Theres a giant disclaimer on every one of these models that responses may contain errors or hallucinations

I don’t remember seeing them, but even if they were there, the general promotion and ways in which LLMs are presented in are trying to tell people otherwise. Some disclaimers are irrelevant for forming people’s opinions compared to the extensive media hype and marketing.

Anyway my point was merely that people do regularly misuse LLMs, and it’s not at all difficult to make them produce crap. The stuff about who should be blamed for the whole situation is probably not something we disagree about too much.

archomrade [he/him]@midwest.social · 1 month ago

The stuff I’ve seen AI produce has sometimes been more wrong than anything a human could produce. And even if a human would produce it and post it on a forum, anyone with half a brain could respond with a correction.

Seems like the problem is that you’re trying to use it for something it isn’t good or consistent at. It’s not a dictionary or encyclopedia, it’s a language model that happens to have some information embedded. It’s not built or designed to retrieve information from a knowledge bank, it’s just there to deconstruct and reconstruct language.

When someone on a forum says they asked GPT and paste its response, I will at the very least point out the general unreliability of LLMs, if not criticise the response itself (very easy if I’m somewhat knowledgeable about the field in question)

Same deal. Absolutely chastise them for using it in that way, because it’s not what it’s good for. But it’s a bit of a frequency bias to assume most people are using it in that way, because those people are the ones using it in the context of social media. Those who use it for more routine tasks aren’t taking responses straight from the model and posting it on lemmy, they’re using it for mundane things that aren’t being shared.

Anyway my point was merely that people do regularly misuse LLMs, and it’s not at all difficult to make them produce crap. The stuff about who should be blamed for the whole situation is probably not something we disagree about too much.

People misuse it because they think they can ask it questions as if it’s a person with real knowledge, or they are using it precisely for it’s convincing bullshit abilities. That’s why I said it’s like laughing at a child for giving a wrong answer or convincing them of a falsehood merely from passive suggestion - the problem isn’t that the kid is dumb, it’s that you’re (both yourself and the person using it) going in with the expectation that they are able to answer that question or distinguish fact from fiction at all.