BabaIsPissed [he/him]@hexbear.nettoTechnology@lemmy.ml•ChatGPT Tests Into Top 1% for Original Creative ThinkingEnglish
6·
1 year agoevaluating LLM
ask the researcher if they are testing form or meaning
they don’t understand
pull out illustrated diagram explaining what is form and what is meaning
they laugh and say “the model is demonstrating creativity sir”
looks at the test
it’s form
This is fucked, you don’t use a black box approach in anything high risk without human supervision. Whisper probably could be used to help accelerate a transcriptions done by an expert, maybe some sort of “first pass” that needs to be validated, but even then it might not help speed things up and might impact quality (see coding with copilot). Maybe also use the timestamp information for some filtering of the most egregious hallucinations, or a bespoke fine-tuning setup (assuming it was fine-tuned it the first place)? Just spitballing here, I should probably read the paper to see what the common error cases are.
It’s funny, because this is the openAI model I had the least cynicism towards, did they bazinga it up when I wasn’t looking?