Mistral AI launches new API for content moderation

8 Nov 2024

Image: © photo for everything/Stock.adobe.com

According to the start-up, the moderation API can be tailored to specific applications and safety standards.

French start-up Mistral AI has launched a new API for content moderation.

The API launched yesterday (7 November) is the same API that powers the moderation service in Le Chat, the Paris-based company’s chatbot. It’s powered by a fine-tuned model known as Ministral 8B.

This model is trained to classify text in a range of languages into one of nine categories: sexual, hate and discrimination, violence and threats, health, financial, law, dangerous and criminal content, self-harm, and personally identifiable information.

“It is natively multilingual and in particular trained on Arabic, Chinese, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish,” Mistral AI explained.

“The content moderation classifier leverages the most relevant policy categories for effective guardrails and introduces a pragmatic approach to LLM safety by addressing model-generated harms such as unqualified advice and [personal identifiable information].”

Mistral AI, which was founded by former researchers at Google’s DeepMind and Meta, has gained momentum is recent months. In June, the start-up raised €600m in equity and debt financing at a valuation of €5.8bn.

The start-up claims that its moderation model is highly accurate but also said that it’s a work in progress at present.

“We’re working with our customers to build and share scalable, lightweight and customisable moderation tooling, and will continue to engage with the research community to contribute safety advancements to the broader field.”

Content moderation is fast becoming a pressing concern for tech and social media companies. Earlier this week, the popular video game platform Roblox announced new measures to improve guardrails protecting children on the platform.

And in France, the CEO and co-founder of the social media site Telegram, Pavel Durov, was arrested back in August over content moderation. Telegram responded at the time, saying that Durov has “nothing to hide” and that the app operates in line with regulation.

However, despite defending its moderation policies, the app quietly started updating its FAQ page in September and removing language that protects private chats from moderation, as reported by The Verge.

Don’t miss out on the knowledge you need to succeed. Sign up for the Daily Brief, Silicon Republic’s digest of need-to-know sci-tech news.

Ciarán Mather is a senior journalist with Silicon Republic

editorial@siliconrepublic.com