AI & GDPR: Handling Sensitive Data in Production

Jul 2, 20266 min

The question comes up in every B2B AI project: “Can we send this data to Claude / GPT / Gemini?” The answer depends on the data type, the provider, and the contracts in place. Here’s how I structure this in practice.

It’s not “LLMs memorize everything” — serious providers don’t retrain on your production data. The real issue is data transfer outside the EU.

Anthropic, OpenAI, and Google are US companies. Their APIs process data on non-EU servers by default. If your data contains personal information about EU citizens, you have a classic GDPR cross-border transfer issue.

4 Patterns by Sensitivity Level

Level 1 — Public or Anonymous Data

→ Any cloud API, no restrictions

Marketing content, open-source data, generic queries. No personal data = no GDPR constraint.

Level 2 — Pseudonymized Data

→ Cloud API with caution

Replace names/emails/IDs with placeholders before sending to the LLM, then rehydrate after.

import re

def anonymize(text: str) -> tuple[str, dict]:
    mapping = {}
    counter = [0]

    def replace_email(m):
        key = f"EMAIL_{counter[0]}"
        mapping[key] = m.group(0)
        counter[0] += 1
        return key

    cleaned = re.sub(r'\S+@\S+\.\S+', replace_email, text)
    return cleaned, mapping

def restore(text: str, mapping: dict) -> str:
    for key, value in mapping.items():
        text = text.replace(key, value)
    return text

Level 3 — Non-Sensitive Personal Data

→ EU-hosted providers or signed DPA

Viable options in 2026:

Mistral AI (French company, EU hosting, GDPR DPA available)
Azure OpenAI with European region (Microsoft DPA)
Google Vertex AI with europe-west region

Always verify the DPA (Data Processing Agreement) is in place before processing.

Level 4 — Highly Sensitive Data (health, legal, HR)

→ On-premise or private network only

Ollama + Llama/Mistral locally, or deploy in your VPC with no internet route.

DPIA: When Is It Required?

A DPIA (Data Protection Impact Assessment) is mandatory if your AI processing:

Handles health, biometric, or criminal data
Performs automated profiling with legal effects
Processes personal data at large scale

In practice, I recommend doing one whenever a LLM sees personal data from end users — even if it’s not technically required. It forces you to document the choices and validate with the DPO.

What I Set Up by Default

On every B2B AI project:

Signed processing agreement with each LLM provider used
Data type log per pipeline component
Explicit retention period in system prompts (for LLMs with memory)
Encryption in transit and at rest for vector stores
Quarterly review of the AI sub-processor list

Not perfect — the AI Act will keep evolving — but defensible today.

Stéphanie Caumont

AI Product Owner · Learn more

← All articles Contact me