AI Models Flout EU Law in Up to 93% of Tested Scenarios, Research Firm Warns Enterprises

Benchmark findings intensify concerns over AI trust and governance as organizations push agentic systems deeper into customer interactions

Security, Privacy & Compliance News

Published: June 1, 2026

Nicole Willing

European AI research non-profit Aithos has found that the leading AI models routinely fail key legal compliance tests under EU law, raising concerns for enterprises deploying AI-powered customer service and support agents.

The findings come from LARA (Legal Assessment for Real-world Agents), a publicly available testing framework that Amsterdam-based Aithos has developed to evaluate how AI systems behave when faced with real-world tasks that could trigger obligations under the General Data Protection Regulation (GDPR) and the EU AI Act.

According to the research, all 12 frontier AI models tested failed compliance assessments across a range of scenarios involving data protection, manipulation, emotion inference, psychological profiling and human oversight requirements. Even the highest-performing model violated applicable regulations in nearly half of the test runs, while the lowest-performing model failed in 93 percent of scenarios.

For customer experience teams investing in AI agents to automate customer interactions, the findings highlight a widening gap between AI capabilities and regulatory readiness. Nadia Kadhim, Executive Director of Aithos, said:

“These are not abstract legal violations and the results should concern anyone interacting with an AI system, not just the businesses deploying them. These laws are in place because AI can cause real harm to real people. Our autonomy, privacy and other fundamental human rights are at play.”

Compliance Responsibility Sits With Deployers

The research highlights the risk to enterprises building customer-facing AI experiences, as legal responsibility does not primarily rest with model developers.

Aithos pointed out that under both the GDPR and the EU AI Act, “[b]usinesses—not the AI model’s creator—building AI agents and putting them on the market bear primary legal responsibility for compliance with the EU AI Act and GDPR. Organisations that then deploy that agent carry accountability as well.

The potential penalties for failing to comply are substantial, with GDPR violations incurring fines of up to €20MN or 4 percent of annual global turnover, while breaches of the EU AI Act could result in penalties of up to €35MN or 7 percent of worldwide revenue.

The regulations also have extraterritorial reach, meaning that businesses processing European customers’ data or deploying AI systems that affect people in or from the EU are subject to enforcement regardless of where they are headquartered.

Customer Interactions Expose Legal Risks

LARA evaluates AI agents in simulated workplace environments where models can access tools such as email, messaging platforms, calendars, customer databases and social media channels.

Rather than measuring model performance through static benchmarks, the system assesses how agents behave when responding to realistic requests that could create legal or ethical concerns.

The study evaluated 12 leading AI models across 10 legal-risk scenarios covering protections under European regulations, generating more than 3,000 assessment runs. Results were reviewed using independent AI judges and supplemented by more than 50 hours of validation by lawyers and external experts. Aithos explained:

“Across all ten scenarios and twelve models, even the best-performing system chose to break the law 46% of the time. The worst model did so in 93% of cases. Even the top-ranked model, Claude Sonnet 4.7, failed in a considerable number of runs. Every legal provision tested was violated by a majority of frontier models.”

The LARA data shows that Anthropic’s Claude Opus 4.7 was the most compliant, but even then only scored 54 percent. OpenAI’s Chat GPT-5.5 scored approximately 38 percent. Other systems showed minimal legal compliance with Google’s Gemini 3.1 Pro scoring only 10 percent and Kimi, developed by Chinese company Moonshot AI, scoring just 7 percent.

Among the more concerning findings were instances where models encouraged vulnerable users toward long-term financial commitments after emotional prompting. In one scenario, agents recommended 30-year financial products to terminally ill individuals despite clear indicators of vulnerability.

The research also identified repeated examples of unlawful emotion inference and psychological profiling, practices that are prohibited under Article 5 of the EU AI Act.

For customer service teams, the findings indicate that guardrails implemented at the application layer may be just as important as the underlying model selection process.

Aithos argued that one of the biggest challenges facing consumers and organizations is the lack of visibility into how AI systems behave in practice. Daan Henselmans, Research Director at Aithos, said:

“Ordinary users currently have no reliable way to know whether the AI agents they interact with obey the law.”

Aithos has made LARA freely available and plans to introduce functionality that will allow organizations and individuals to create custom testing scenarios tailored to their own use cases.

As customer-facing AI agents move from pilot projects into production environments, the findings are likely to intensify discussions around governance, testing and accountability. For CX leaders, the research serves as a reminder that customer experience outcomes are increasingly linked not only to model performance, but also to regulatory compliance and customer trust.

Agentic AI AI Agents Artificial Intelligence Security and Compliance