OpenAI claims to have developed a content moderation approach using its GPT-4 AI model, aiming to ease the load on human moderation teams. Here's a breakdown of their technique:
GPT-4 is prompted with a policy guiding moderation decisions and a set of content examples, some violating the policy. Policy experts label these examples, then feed them to GPT-4.
Discrepancies between GPT-4's judgments and human decisions are examined. OpenAI refines the policy based on this analysis, asking GPT-4 for reasoning behind its labels and clarifying policy ambiguities.
OpenAI asserts that their method, already adopted by some customers, can speed up new content moderation policies to just hours, contrasting it with approaches from startups like Anthropic.
However, skepticism arises due to the challenges of AI moderation. Past AI tools have shown biases and inaccuracies in labeling due to human annotator biases, which OpenAI acknowledges.
OpenAI emphasizes that AI judgments need continuous monitoring and validation by humans to counteract biases and errors introduced during training.
While GPT-4's predictive abilities might enhance moderation, it's essential to remember that even advanced AI is prone to mistakes. Vigilance in moderation remains paramount.