Irish-led CalypsoAI launches ‘world first’ security index for GenAI

26 Feb 2025

Image: © peshkova/Stock.adobe.com

The leaderboard ranks a model’s ability to withstand advanced security threats, its risk-to-performance ratio and the resulting cost.

Start-up CalypsoAI has released a security index designed to assess the safety of generative AI (GenAI) models and expose critical vulnerabilities, in what the company has called a comprehensive world first. 

The CalypsoAI Security Leaderboard, powered by Inference Red-Team, compares the safety, cost and capabilities of a wide range of major GenAI models based on real-world security testing. This involves identifying weaknesses that would allow an attacker to crash a system, cause lag or utilise resources. 

CalypsoAI was founded in California’s Silicon Valley in 2018 and has its headquarters in Dublin and New York. In June of 2023, it established its Dublin centre of excellence, announcing intentions to more than double its Irish workforce from a team of 20 to a team of 50 by 2025. And late last year, it announced the expansion of these plans with the aim of hiring 50 more staff in the areas of engineering and data science.

Currently topping the security index is Anthropic’s Claude 3.5 Sonnet, followed closely by Microsoft’s Phi4-14B and Anthropic’s Claude 3.5 Hiku. OpenAI’s GPT-4o and Meta’s Llama 3.3 70b complete the top five. Despite recent controversies around Chinese start-up DeepSeek, it’s R1-Distill-Llama-70B and R1 models take sixth and seventh place.

“Our Inference Red-Team product has successfully broken all the world-class GenAI models that exist today,” said Donnchadh Casey, the CEO of CalypsoAI.

“Many organisations are adopting AI without understanding the risks to their business and clients. Moving forward, the CalypsoAI Security Leaderboard provides a benchmark for business and technology leaders to integrate AI safely and at scale.”

Anthropic has also just announced the next phase of its GenAI models with the Claude 3.7 Sonnet. The model enables the use of an extended thinking mode, which allows the tech to consider a prompt for a longer amount of time before giving a response. Users can toggle the extended thinking mode on or off and the aim is that answers will be of a higher quality when reviewed. 

Don’t miss out on the knowledge you need to succeed. Sign up for the Daily Brief, Silicon Republic’s digest of need-to-know sci-tech news.

Laura Varley is the Careers reporter for Silicon Republic

editorial@siliconrepublic.com