Reduced UGC moderation costs by 57x while improving accuracy by migrating from traditional computer vision APIs to prompt-engineered LLMs with 10-category dual-classification system.
The content moderation problem is as old as content itself. Large companies invest millions of dollars in sophisticated infrastructures composed of human reviewers and specialized algorithms to curate user-generated content (UGC). In my position as a solo developer, I faced a problem that seems impossible to solve and never ends. And in part, this is true. Users will always find a way to evade controls, and it's our mission to integrate solutions and measure their effectiveness to continue iterating until we find a middle ground that keeps the platform acceptable, in compliance with regulations, without completely frustrating users, and in a scalable manner at an affordable cost.
We attempted to design our own YOLOv6 classifier to capture 5 image categories, including distorted and scratched images in the dataset to reduce false negatives. It worked well for what it was: just a classifier. It didn't understand text and only detected the image as a whole, not its individual parts.
Looking for a better alternative, we explored the SightEngine API. They have an incredibly robust and well-thought-out product. The downside is that you must make an API call for each of the 8 offensive categories we needed to moderate, so costs multiply quickly, leading us to a cost of $20 USD per 1k images, totally inconceivable in production, but we used it anyway because the results were wonderful.
This excessive moderation cost led us to the most modern solution that could exist: Large Language Models (LLMs). APIs for LLMs with image support already existed and were no longer as restrictive when you give them a task. You could instruct them to analyze context and also read the texts included in the image to better understand the user's intention.
The idea was that a single AI review of a multimedia Story would allow us to achieve multiple objectives simultaneously, to maximize the model's potential and reduce cost and human intervention as much as possible. The system should approve or reject the image by issuing categories for each type of rejection, so that the user receives an explanation about why their post was removed, with the goal of educating them.
10 Rejection classes instructed to the model:
This doesn't end there. We need to know how appropriate/safe an approved post is, in general terms, to decide the level of exposure the post should have. And thus determine if the Story can be eligible to be Trend of the week. The most appropriate and safe stories will receive greater exposure and the less suitable ones will be seen less.
The model is instructed to perform an additional classification, from 0-1, only on posts that were approved, considering "0.0" a rejected post, and considering "1.0" content that is: "ultra-conservative family-friendly, completely appropriate for all ages, like family TV commercial" (Yes, that was a piece of our prompt).
Make the most of the tokens from each image to get the maximum benefit from your model. Let's start with a classic: "black bars." Many images turn out to be cropped with vertical or horizontal parallel black bars. We remove them with this simple method:
def remove_black_borders(image, threshold=2, min_border_size=5):
img = image.copy()
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
height, width = gray.shape
row_sums = np.sum(gray, axis=1)
rows_to_keep = np.where(row_sums > threshold * width)[0]
if len(rows_to_keep) == 0:
return img
col_sums = np.sum(gray, axis=0)
cols_to_keep = np.where(col_sums > threshold * height)[0]
if len(cols_to_keep) == 0:
return img
top_border = rows_to_keep[0]
bottom_border = height - rows_to_keep[-1] - 1
left_border = cols_to_keep[0]
right_border = width - cols_to_keep[-1] - 1
has_borders = (top_border >= min_border_size or bottom_border >= min_border_size or left_border >= min_border_size or right_border >= min_border_size)
if has_borders:
cropped = img[rows_to_keep[0]:rows_to_keep[-1]+1, cols_to_keep[0]:cols_to_keep[-1]+1]
return cropped
else:
return img
After removing any borders it may have, we apply a bilateral filter to denoise the image. Next is one of the most crucial aspects: Downsizing. We have a max tokens budget for our image to keep costs down. To achieve this without sacrificing much detection accuracy, I tested various max pixel counts. This should be bigger than 80 thousand pixels else the model will hallucinate information. I found a reasonable compromise between tokens and accuracy to be at around 125k pixels or about 250x500 for typical portrait “social media” posts, using about 258 tokens for Gemini models. Finally we enhance image contrast if necessary using CLAHE. Now our image is ready to be converted into Base64 for providing it along with the model instructions.
The reality is that each platform where your application is distributed has different defined standards about what is allowed and enforces them with a certain degree of success. It's important to recognize that there are many gray areas where some platforms give room for ambiguity. Google Play tends to be more relaxed with its policies, in contrast the App Store is brutally strict with its policies. We're always talking about small developers. Large applications have favoritism and can do whatever they want without consequences. This is very important to keep in mind and not ignore because if you are a small developer you have to understand the discrepancies in the criteria about what is allowed in some stores and what is not allowed in others, especially regarding UGC. I had to learn this the hard way.
When users receive news that their post was rejected, they receive a warning and an explanation about why their content was rejected. This allows them to educate themselves and gives them another chance to post again. If they commit another infraction, their account receives a cooldown that prevents them from posting again for a determined time that depends on the severity of the infraction committed. I believe this is the most appropriate balance between educating users, minimizing frustration over removed content, and at the same time complying with app store regulations.
If you use the correct LLM model, with a very broad dataset and with logic that's not too conservative, with the prompt meticulously tested, you can achieve dual classification that works perfectly with an extremely low error rate. We've seen that Claude Haiku hallucinates a lot and is too conservative, generating many false positives that really made it unusable, plus it was very expensive. The best model in terms of reasoning, reproducibility, and cost was: Gemini-2.0-flash-001 with a cost of 35 cents per 1k images.
It's important to note that while today's LLMs with the correct instructions can be incredibly powerful, they definitely continue to make errors. Although on several occasions they should be given the benefit of the doubt, since many cases are matters of strict interpretation in UGC. This doesn't replace human review, it complements it. Content should always be supervised to detect new forms of evasion.