Education· Last updated April 6, 2026

Build CCPA-Compliant Data Pipelines for SaaS Platforms with GlobalShield API

Learn how to build CCPA-compliant data pipelines that detect, redact, and handle California consumer PII using GlobalShield API in Python. Covers opt-out flows and data deletion.

Build CCPA-Compliant Data Pipelines for SaaS Platforms with GlobalShield API

California's Consumer Privacy Act (CCPA) and its amendment, CPRA, give California residents four core rights: the right to know what personal data you collect, the right to delete it, the right to opt out of its sale, and the right to non-discrimination for exercising those rights.

For SaaS companies, these aren't just legal obligations — they're engineering requirements. Handling a deletion request means knowing exactly where that user's PII lives across your databases, logs, and data pipelines. Most teams don't have that visibility until a request arrives.

This guide shows how to build CCPA-compliant data pipelines using the GlobalShield API — automating PII detection, tagging, redaction, and deletion tracking across your entire data stack.

What CCPA Requires in Practice

The California Privacy Protection Agency (CPPA) reached a $2.75 million settlement in early 2026 — the largest to date — against a streaming company for failing to honor opt-out requests within the required 15-day window.

The technical requirements that trip up most SaaS teams:

  1. Data inventory: Know every place California resident PII is stored
  2. Deletion within 45 days: Fulfill verified deletion requests across all systems
  3. Opt-out propagation: Stop selling/sharing data within 15 business days
  4. Audit trail: Prove compliance with timestamped records

GlobalShield automates step one (finding PII) and integrates with your existing deletion workflows.

Step 1: Setup

pip install requests pandas python-dotenv
import os
import json
import hashlib
import requests
from datetime import datetime
from dotenv import load_dotenv
 
load_dotenv()
 
GLOBALSHIELD_KEY = os.getenv("YOUR_API_KEY")
GLOBALSHIELD_BASE = "https://apivult.com/api/globalshield"
 
HEADERS = {
    "X-RapidAPI-Key": GLOBALSHIELD_KEY,
    "Content-Type": "application/json"
}

Step 2: PII Detection Across Data Sources

def scan_text_for_pii(text: str, source_label: str) -> dict:
    """
    Scan arbitrary text for California-regulated PII categories.
    CCPA covers: name, email, phone, SSN, driver's license,
    financial account numbers, IP address, biometrics, geolocation.
    """
    payload = {
        "text": text,
        "regulations": ["CCPA", "CPRA"],
        "detection": {
            "categories": [
                "name", "email", "phone", "ssn", "drivers_license",
                "financial_account", "ip_address", "geolocation",
                "precise_geolocation", "biometric_identifier"
            ],
            "confidence_threshold": 0.85,
            "include_context": True
        },
        "metadata": {
            "source": source_label,
            "scanned_at": datetime.utcnow().isoformat()
        }
    }
 
    resp = requests.post(
        f"{GLOBALSHIELD_BASE}/detect",
        json=payload,
        headers=HEADERS
    )
    resp.raise_for_status()
    return resp.json()["data"]
 
 
def scan_database_export(records: list[dict], source_label: str) -> list[dict]:
    """Scan a batch of database records for PII."""
    flagged = []
 
    for record in records:
        # Combine all text fields into one blob for scanning
        text_blob = " | ".join(
            str(v) for v in record.values() if v is not None
        )
        scan = scan_text_for_pii(text_blob, source_label)
 
        if scan["pii_detected"]:
            flagged.append({
                "record_id": record.get("id", "unknown"),
                "source": source_label,
                "pii_categories": scan["detected_categories"],
                "risk_level": scan["risk_level"],
                "entities": scan["entities"]
            })
 
    return flagged

Step 3: Redaction for Analytics and Logging

Logs and analytics pipelines often contain PII that should never be stored. Use GlobalShield to redact before persisting.

def redact_for_analytics(text: str, strategy: str = "mask") -> str:
    """
    Redact PII from text before storing in analytics or logs.
    strategy: 'mask' (replace with ***), 'pseudonymize' (replace with token),
              'generalize' (replace with category label)
    """
    payload = {
        "text": text,
        "redaction": {
            "strategy": strategy,
            "regulations": ["CCPA"],
            "preserve_format": True  # Keeps structure, redacts values
        }
    }
 
    resp = requests.post(
        f"{GLOBALSHIELD_BASE}/redact",
        json=payload,
        headers=HEADERS
    )
    resp.raise_for_status()
    return resp.json()["data"]["redacted_text"]
 
 
# Example: Redact user activity log before writing to data warehouse
def process_event_log(raw_event: dict) -> dict:
    """Strip PII from event log before storing in analytics."""
    event = raw_event.copy()
 
    # Redact user-agent string, IP, and any user-supplied text
    fields_to_redact = ["user_agent", "ip_address", "search_query", "feedback_text"]
 
    for field in fields_to_redact:
        if field in event and event[field]:
            event[field] = redact_for_analytics(str(event[field]))
 
    event["pii_processed"] = True
    event["processed_at"] = datetime.utcnow().isoformat()
    return event

Step 4: Handle Deletion Requests (Right to Erasure)

# Pseudonymization map: maps real user IDs to anonymous tokens
# Stored separately from the data — allows reversal for deletion
PSEUDONYM_STORE = {}  # In production: store in encrypted Redis/DB
 
 
def pseudonymize_user_id(real_id: str) -> str:
    """Create a consistent anonymous token for a user ID."""
    token = hashlib.sha256(f"ccpa-salt-{real_id}".encode()).hexdigest()[:16]
    PSEUDONYM_STORE[token] = real_id
    return token
 
 
def process_deletion_request(
    user_id: str,
    data_stores: list[str],
    request_date: str
) -> dict:
    """
    Orchestrate a CCPA deletion request across data stores.
    Returns a compliance audit record.
    """
    deletion_log = {
        "user_id_hash": hashlib.sha256(user_id.encode()).hexdigest(),
        "request_received": request_date,
        "deadline": "45 days from request date",
        "stores_processed": [],
        "status": "IN_PROGRESS"
    }
 
    for store in data_stores:
        # In a real system, each store handler would delete/anonymize records
        # Here we log the action for the audit trail
        deletion_log["stores_processed"].append({
            "store": store,
            "action": "DELETION_QUEUED",
            "queued_at": datetime.utcnow().isoformat()
        })
        print(f"  Queued deletion for {user_id} in {store}")
 
    deletion_log["status"] = "DELETION_QUEUED"
    deletion_log["audit_id"] = hashlib.sha256(
        f"{user_id}-{request_date}".encode()
    ).hexdigest()[:12].upper()
 
    return deletion_log
 
 
def verify_deletion_completion(deletion_log: dict, completed_stores: list[str]) -> dict:
    """Mark deletion complete and generate compliance certificate."""
    deletion_log["completed_stores"] = completed_stores
    deletion_log["completed_at"] = datetime.utcnow().isoformat()
 
    all_stores = {s["store"] for s in deletion_log["stores_processed"]}
    if set(completed_stores) >= all_stores:
        deletion_log["status"] = "DELETION_COMPLETE"
        deletion_log["compliance_status"] = "CCPA_COMPLIANT"
    else:
        pending = all_stores - set(completed_stores)
        deletion_log["status"] = "PARTIAL"
        deletion_log["pending_stores"] = list(pending)
 
    return deletion_log

Step 5: Opt-Out Signal Propagation

When a California resident opts out of data sale/sharing, you have 15 business days to stop. Automate the propagation:

def process_opt_out(user_id: str, opt_out_timestamp: str) -> dict:
    """
    Record and propagate a CCPA opt-out signal.
    Must reach all downstream sharing partners within 15 business days.
    """
    opt_out_record = {
        "user_id_hash": hashlib.sha256(user_id.encode()).hexdigest(),
        "opt_out_type": "DO_NOT_SELL_OR_SHARE",
        "received_at": opt_out_timestamp,
        "propagated_to": [],
        "compliance_deadline": "15 business days"
    }
 
    # Downstream partners to notify (anonymized IDs, not real names)
    downstream_partners = ["analytics_platform", "advertising_network", "data_enrichment"]
 
    for partner in downstream_partners:
        # In production: send API call to each partner's opt-out endpoint
        opt_out_record["propagated_to"].append({
            "partner": partner,
            "notified_at": datetime.utcnow().isoformat(),
            "status": "SENT"
        })
 
    opt_out_record["propagation_status"] = "COMPLETE"
    return opt_out_record

Step 6: CCPA Data Inventory Scanner

Run this periodically to maintain your data inventory:

def run_ccpa_inventory_scan(
    data_sources: dict[str, list[dict]]
) -> dict:
    """
    Scan all data sources and produce a CCPA data inventory report.
    data_sources: { "source_name": [records], ... }
    """
    inventory = {
        "scan_date": datetime.utcnow().isoformat(),
        "sources_scanned": len(data_sources),
        "pii_findings": [],
        "summary_by_category": {}
    }
 
    for source_name, records in data_sources.items():
        print(f"Scanning {source_name} ({len(records)} records)...")
        findings = scan_database_export(records, source_name)
 
        for finding in findings:
            for category in finding["pii_categories"]:
                inventory["summary_by_category"][category] = (
                    inventory["summary_by_category"].get(category, 0) + 1
                )
 
        inventory["pii_findings"].extend(findings)
 
    inventory["total_records_with_pii"] = len(inventory["pii_findings"])
    inventory["risk_assessment"] = (
        "HIGH" if inventory["total_records_with_pii"] > 1000
        else "MEDIUM" if inventory["total_records_with_pii"] > 100
        else "LOW"
    )
 
    return inventory

CCPA Compliance Checklist for SaaS

RequirementTechnical Implementation
Privacy noticeDisclose PII categories collected on signup
Right to knowAPI endpoint to return all data for a user_id
Right to deleteDeletion pipeline covering all data stores
Right to opt outGPC signal support + opt-out form → propagation
Data minimizationPII redaction in analytics/logs before storage
15-day opt-out deadlineAutomated propagation on opt-out event
45-day deletion deadlineQueued deletion with SLA monitoring
Audit trailImmutable logs for all privacy operations

The Cost of Non-Compliance

The CPPA's $2.75 million settlement in February 2026 for a delayed opt-out response signals that California regulators are moving from warnings to enforcement. The penalty cap is $7,500 per intentional violation — for companies processing millions of California records, single incidents can reach eight figures.


Automate your CCPA compliance scanning today. Get started with GlobalShield API and run your first PII inventory scan free.