Pseudonymisation
Definition
Pseudonymisation (often spelled pseudonymization) is a GDPR-defined technique for reducing privacy risk by processing personal data so it can no longer be attributed to a specific individual without additional information. Under the GDPR, this “additional information” (such as a mapping table, re-identification key, or token vault) must be kept separately and protected with appropriate technical and organisational measures, so the pseudonymised dataset on its own is less directly identifying. Pseudonymisation is not the same as anonymisation: pseudonymised data can still be personal data because re-identification remains possible if the additional information is available. In practice, organisations use pseudonymisation to support data protection by design and by default, improve security of processing, and enable controlled analytics, testing, or sharing while limiting exposure of direct identifiers. Common approaches include replacing identifiers with tokens, using keyed hashing, or encrypting identifiers while managing keys and access carefully. Comparable concepts appear across other privacy and security programs as de-identification, tokenisation, data masking, and controlled re-identification, but GDPR’s pseudonymisation specifically assumes reversibility under strict safeguards.
Real-World Examples
Analytics on customer activity without direct identifiers
A startup replaces emails and phone numbers with random tokens before running product analytics, storing the token map in a separate, access-controlled system.
Safer testing with production-like data
An SMB pseudonymises user IDs in exported datasets for QA, so developers can reproduce issues without seeing direct personal identifiers.
Controlled sharing for fraud investigations
An enterprise shares pseudonymised transaction records with an internal investigation team, granting re-identification access only to a small, audited group.
It is a way to transform personal data so individuals are not directly identifiable without separate additional information, reducing exposure while keeping controlled re-identification possible.
Pseudonymization can be reversed with additional information, while anonymization aims to make re-identification not reasonably possible, meaning anonymized data should no longer be personal data.
Often yes, because re-identification may still be possible if the additional information exists, so GDPR obligations can still apply to pseudonymised datasets.
Common techniques include tokenization of identifiers, keyed hashing (e.g., HMAC), and encryption of identifiers, combined with strict separation and protection of keys or mapping tables.
Tokenization is a specific pseudonymization method that replaces identifiers with random tokens and relies on a secure mapping system, whereas pseudonymization is the broader concept covering multiple techniques.
Hashing can be pseudonymization when it is keyed and protected; simple hashing may be vulnerable to guessing or lookup attacks, and reversibility can occur via brute force or reference tables.
Store it separately with strong access controls, encryption at rest and in transit, strict logging, least-privilege permissions, and robust key management so only authorized workflows can re-identify data.
Risks include unauthorized access to the mapping data, linkage attacks using auxiliary datasets, weak token/key controls, and inference from rare attributes or small populations in the dataset.
Use it when teams need realistic data patterns but do not need direct identifiers, such as product analytics, QA testing, model training, or controlled internal sharing, while limiting privacy and breach impact.
Validate separation of mapping data, review access logs and permissions, test resistance to linkage and guessing attacks, confirm key management controls, and document the method, scope, and residual re-identification risk.
| Version | Date | Author | Description |
|---|---|---|---|
| 1.0.0 | 2026-02-26 | WatchDog Security GRC Wiki Team | Initial publication |