The question comes up in every B2B AI project: “Can we send this data to Claude / GPT / Gemini?” The answer depends on the data type, the provider, and the contracts in place. Here’s how I structure this in practice.
The Real GDPR Problem with LLMs
It’s not “LLMs memorize everything” — serious providers don’t retrain on your production data. The real issue is data transfer outside the EU.
Anthropic, OpenAI, and Google are US companies. Their APIs process data on non-EU servers by default. If your data contains personal information about EU citizens, you have a classic GDPR cross-border transfer issue.
4 Patterns by Sensitivity Level
Level 1 — Public or Anonymous Data
→ Any cloud API, no restrictions
Marketing content, open-source data, generic queries. No personal data = no GDPR constraint.
Level 2 — Pseudonymized Data
→ Cloud API with caution
Replace names/emails/IDs with placeholders before sending to the LLM, then rehydrate after.
import re
def anonymize(text: str) -> tuple[str, dict]:
mapping = {}
counter = [0]
def replace_email(m):
key = f"EMAIL_{counter[0]}"
mapping[key] = m.group(0)
counter[0] += 1
return key
cleaned = re.sub(r'\S+@\S+\.\S+', replace_email, text)
return cleaned, mapping
def restore(text: str, mapping: dict) -> str:
for key, value in mapping.items():
text = text.replace(key, value)
return text
Level 3 — Non-Sensitive Personal Data
→ EU-hosted providers or signed DPA
Viable options in 2026:
- Mistral AI (French company, EU hosting, GDPR DPA available)
- Azure OpenAI with European region (Microsoft DPA)
- Google Vertex AI with europe-west region
Always verify the DPA (Data Processing Agreement) is in place before processing.
Level 4 — Highly Sensitive Data (health, legal, HR)
→ On-premise or private network only
Ollama + Llama/Mistral locally, or deploy in your VPC with no internet route.
DPIA: When Is It Required?
A DPIA (Data Protection Impact Assessment) is mandatory if your AI processing:
- Handles health, biometric, or criminal data
- Performs automated profiling with legal effects
- Processes personal data at large scale
In practice, I recommend doing one whenever a LLM sees personal data from end users — even if it’s not technically required. It forces you to document the choices and validate with the DPO.
What I Set Up by Default
On every B2B AI project:
- Signed processing agreement with each LLM provider used
- Data type log per pipeline component
- Explicit retention period in system prompts (for LLMs with memory)
- Encryption in transit and at rest for vector stores
- Quarterly review of the AI sub-processor list
Not perfect — the AI Act will keep evolving — but defensible today.
Stéphanie Caumont
AI Product Owner · Learn more