Hey, Lemmies! I’ve been pondering an idea to enhance our automod system, and I’d love to get your input. LLMs have proven to be quite adept at sentiment analysis, consistently delivering accurate results. Here’s what I’m thinking: if we provide the LLM with a set of instance rules and feed it a message, we could ask it whether or not the message adheres to those rules. This approach has the potential to create a robust automod that works effectively in most cases. What are your thoughts on this? Let’s discuss and explore the possibilities together!
Example usage:
Rules
- No bigotry - including racism, sexism, ableism, homophobia, transphobia, or xenophobia. Code of Conduct .
- Be respectful, especially when disagreeing . Everyone should feel welcome here.
- No porn.
- No Ads / Spamming.
Does this message adhere to the rules? Answer only with yes/no and if not provide a short sentence for the report.
Message
The OpenAI moderation API [1] works fairly well and is straightforward to implement, and fairly cheap to use. I’m not sure how to get started with bots or moderation tools on lemmy (Reddit refugee) yet but it should be doable with a smallish monthly bill.
[1] https://platform.openai.com/docs/guides/moderation/overview
Issue is cpu and memory usage, it’s very intensive…
Yeah, and small models are likely to be less reliable.
While it’s a great theory, people who want to use those bad words/concepts have been working around language filters for ages. Anti-vaxers will call them Vitamins for example.