Skip to content

Data anonymization

The Swiss AI Hub integrates Presidio for detecting and anonymizing personally identifiable information (PII) in user questions before they reach external LLM providers.

What Presidio protects

Presidio catches PII that users accidentally include in their questions:

  • "Can you review this contract for John Smith (john.smith@company.com)?"
  • "What's the policy for credit card 4532-1234-5678-9012?"
  • "How do I process the invoice for phone number +41 79 123 45 67?"

Without Presidio, this PII gets sent to external LLM providers (OpenAI, Google) as part of the API request.

How it works

Presidio runs as a guardrail in the LiteLLM proxy layer. It scans user questions for PII patterns before sending requests to language models.

Two anonymization modes:

Mask mode replaces detected PII with placeholders like [PERSON] or [EMAIL_ADDRESS]. The question "Can you review this contract for John Smith?" becomes "Can you review this contract for [PERSON]?" before reaching the LLM. The model can still understand context and generate a useful response.

Block mode rejects the entire request when certain PII types appear. Use this for highly sensitive data like credit card numbers. The user receives an error message instead of a response.

Supported PII types

Presidio detects person names, email addresses, credit card numbers, phone numbers, social security numbers, IP addresses, and dates or locations that could identify individuals. Detection uses pattern matching, regular expressions, and named entity recognition models. The system supports multiple languages including English, German, French, and Italian.

Configuration

Presidio guardrails are configured in LiteLLM but disabled by default. Administrators enable them per deployment based on data sensitivity requirements.

Example guardrail configuration:
yaml
guardrails:
  - guardrail_name: "presidio-mask-guard"
    litellm_params:
      guardrail: presidio
      default_on: false
      mode: "pre_call"
      presidio_language: "de"
      output_parse_pii: true
      pii_entities_config:
        PERSON: "MASK"
        EMAIL_ADDRESS: "MASK"

  - guardrail_name: "presidio-block-guard"
    litellm_params:
      guardrail: presidio
      default_on: false
      mode: "pre_call"
      pii_entities_config:
        CREDIT_CARD: "BLOCK"

Configure which PII types to mask or block through the pii_entities_config section. Set default_on: true to enable the guardrail for all requests.

When to use anonymization

ScenarioRecommendation
External LLM providers (OpenAI, Google, etc.)Enable Presidio to protect data before it leaves your infrastructure
Self-hosted models on-premisesOptional - your data never leaves your control
User-generated contentEnable if users might accidentally include PII
Pre-sanitized dataSkip Presidio to avoid unnecessary overhead
Regulatory requirements (GDPR, HIPAA)Enable to demonstrate PII protection controls

Presidio applies to all requests when enabled. Use block mode sparingly - only for highly sensitive PII types like credit cards where you need hard rejection rather than masking.

Built with ❤️ in Switzerland 🇨🇭