AI coding tools are helping companies large and small churn out agents to handle everything from internal workflows to customer service. Building an AI agent is the easy part; making sure it operates as intended in the real world is the challenge.
Galtea, a Barcelona-based startup, has raised a €2.7 million ($3.2 million) seed round to help companies test their AI solutions before deploying them. AI testing costs an estimated $13 billion annually in Europe and the US, yet companies still encounter unexpected glitches after releasing their AI solutions into “the wild.”
“As enterprises race to deploy generative AI, the gap between what models can do and what companies can trust them to do is widening fast,” wrote 42CAP, a German tech fund that led the round. Galtea, 42CAp said, has “built the missing quality assurance layer.”
Mozilla Ventures, a $35 million venture capital fund set up by the open source nonprofit to invest in startups developing safe and inclusive AI, also participated, alongside existing investors JME Ventures, Masia and ABAC Nest Ventures.
“Without strong evaluation data, enterprises are guessing whether their AI is behaving correctly,” wrote Mozilla Ventures in a blog post announcing the investment. “As AI systems become more autonomous and agentic, that guesswork becomes riskier.”
Responsible AI
Galtea is part of a growing market for “assurance tech” solutions that create guardrails to ensure AI systems are safe, compliant, and trustworthy. The company, spun out of the Barcelona Supercomputing Center in 2024, has identified testing as an overlooked challenge.
High testing costs push many AI teams to rely on generic benchmarks that fail to reflect real-world conditions. Galtea simulates user interactions, including adversarial inputs, before deployment when problems are often discovered. It also generates hundreds of scenarios automatically and flags where an agent breaks down.
In one case, a customer support agent at a large financial institution failed more than 2,000 evaluations tied to seven critical vulnerabilities — a failure rate 12 times higher than what internal testing had revealed. Customers have reported a 71% average reduction in validation costs, according to Galtea.
The company counts Spanish telecom giant Telefónica and ABANCA, a regional retail bank, and Punto, an AI-based platform for dementia care, as customers. “Thorough testing and simulations for accuracy, performance, behaviour, and security are the only way developers can know how their platform will perform in a real-world setting,” said Palomar.