BabaIsPissed [he/him]

BabaIsPissed [he/him]@hexbear.net · 2 days ago

This is fucked, you don’t use a black box approach in anything high risk without human supervision. Whisper probably could be used to help accelerate a transcriptions done by an expert, maybe some sort of “first pass” that needs to be validated, but even then it might not help speed things up and might impact quality (see coding with copilot). Maybe also use the timestamp information for some filtering of the most egregious hallucinations, or a bespoke fine-tuning setup (assuming it was fine-tuned it the first place)? Just spitballing here, I should probably read the paper to see what the common error cases are.

It’s funny, because this is the openAI model I had the least cynicism towards, did they bazinga it up when I wasn’t looking?

BabaIsPissed [he/him]@hexbear.net · 1 year ago

evaluating LLM

ask the researcher if they are testing form or meaning

they don’t understand

pull out illustrated diagram explaining what is form and what is meaning

they laugh and say “the model is demonstrating creativity sir”

looks at the test

it’s form

BabaIsPissed [he/him]@hexbear.net · edit-2 1 year ago

deleted by creator