The Facebook insider building content moderation for the AI era

When Brett Levenson left Apple in 2019 to guide enterprise integrity at Fb, the social media large was within the thick of the Cambridge Analytica fallout. On the time, he thought he might merely repair Fb’s content material moderation drawback with higher expertise.

The issue, he rapidly realized, ran deeper than expertise. Human reviewers have been anticipated to memorize a 40-page coverage doc that had been machine-translated into their language, he stated. Then they’d about 30 seconds per piece of flagged content material to resolve not simply whether or not that content material violated the principles, however what to do about it: block it, ban the consumer, restrict the unfold. These fast calls have been solely “barely higher than 50% correct,” based on Levenson.

“It was sort of like flipping a coin, whether or not the human reviewers might truly tackle insurance policies appropriately, and this was many days after the hurt had already occurred anyway,” Levenson instructed Trendster.

That type of delayed, reactive method will not be sustainable in a world of nimble and well-funded adversarial actors. The rise of AI chatbots has solely compounded the issue, as content material moderation failures have resulted in a string of high-profile incidents, like chatbots offering teenagers with self-harm steering or AI-generated imagery evading security filters.

Levenson’s frustration led to the thought of “coverage as code” — a method to flip static coverage paperwork into executable, updatable logic tightly coupled to enforcement. That perception led to the founding of Moonbounce, which introduced on Friday it has raised $12 million in funding, Trendster has solely realized. The spherical was co-led by Amplify Companions and StepStone Group.

Moonbounce works with firms to supply a further security layer wherever content material is generated, whether or not by a consumer or by AI. The corporate has educated its personal massive language mannequin to have a look at a buyer’s coverage paperwork, consider content material at runtime, present a response in 300 milliseconds or much less, and take motion. Relying on buyer choice, that motion might seem like Moonbounce’s system slowing down distribution whereas the content material awaits a human overview later, or it’d block high-risk content material within the second.

As we speak, Moonbounce serves three predominant verticals: Platforms coping with user-generated content material like courting apps; AI firms constructing characters or companions; and AI picture turbines.

Techcrunch occasion

San Francisco, CA
|
October 13-15, 2026

Moonbounce is supporting greater than 40 million each day evaluations and serving over 100 million each day energetic customers on the platform, Levenson stated. Prospects embody AI companion startup Channel AI, picture and video era firm Civitai, and character roleplay platforms Dippy AI and Moescape.

“Security can truly be a product profit,” Levenson instructed Trendster. “It simply by no means has been as a result of it’s at all times a factor that occurs later, not a factor you possibly can truly construct into your product. And we see our prospects are discovering actually fascinating and progressive methods to make use of our expertise to make security a differentiator, and a part of their product story.”

Tinder’s head of belief and security just lately defined how the courting platform makes use of these kinds of LLM-powered companies to achieve a 10x enchancment in accuracy of detections.

“Content material moderation has at all times been an issue that plagued massive on-line platforms, however now with LLMs on the coronary heart of each software, this problem is much more daunting,” Lenny Pruss, basic companion at Amplify Companions, stated in an announcement. “We invested in Moonbounce as a result of we envision a world the place goal, real-time guardrails change into the enabling spine of each AI-mediated software.”

AI firms are dealing with mounting authorized and reputational strain after chatbots have been accused of pushing youngsters and weak customers towards suicide and picture turbines like xAI’s Grok have been used to create nonconsensual nude imagery. Clearly, security guardrails internally are failing, and it’s turning into a legal responsibility query. Levenson stated AI firms are more and more trying exterior their very own partitions for assist beefing out security infrastructure.

“We’re a 3rd get together sitting between the consumer and the chatbot, so our system isn’t inundated with context the best way the chat itself is,” Levenson stated. “The chatbot itself has to recollect, doubtlessly, tens of 1000’s of tokens which have come earlier than…We’re solely frightened about implementing guidelines at runtime.”

Levenson runs the 12-person firm along with his former Apple colleague Ash Bhardwaj, who beforehand constructed large-scale cloud and AI infrastructure throughout the iPhone-maker’s core choices. Their subsequent focus is a functionality known as “iterative steering,” developed in response to circumstances just like the 2024 suicide of a 14-year-old Florida boy who turned obsessive about a Character AI chatbot. Moderately than a blunt refusal when dangerous matters come up, the system would intercept the dialog and redirect it, modifying prompts in actual time to push the chatbot towards a extra actively supportive response.

“We hope to have the ability to add to our actions toolkit the power to steer the chatbot in a greater route to, basically, take the consumer’s immediate and modify it to power the chatbot to be not simply an empathetic listener, however a useful listener in these conditions,” Levenson stated.

When requested whether or not his exit technique concerned an acquisition by an organization like Meta, bringing his work on content material moderation full circle, Levenson stated he acknowledges how nicely Moonbounce would match into his previous employer’s stack, in addition to his personal fiduciary duties as a CEO.

“My buyers would kill me for saying this, however I’d hate to see somebody purchase us after which prohibit the expertise,” he stated. “Like, ‘Okay, that is ours now, and no person else can profit from it.’”