Pseudonymization and anonymization of personal data

One thing that is heavily emphasised in GDPR is the importance of Privacy by Design. Mechanisms to protect personal integrity should be built into IT systems and services.

One of the core principles is data minimization. This means that all products and services should be designed so that as little personal data as possible is processed. According to the Swedish Data Protection Authority you can do this in the following ways:

  • Limit data processing to information that only identifies an individual indirectly
  • Limit data gathering to data that is less sensitive
  • Replace names, e.g., with pseudonyms
  • Do not routinely have personal identity numbers as fields in databases
  • Two terms that have been used a lot when discussing Privacy by Design and data minimization are anonymization and pseudonymization.

    So what does anonymization and pseudonymization mean? How are they different?

    Both anonymization and pseudonymization refer to hiding identities and personal data – but in different ways.

    Pseudonymization:

    Pseudonym means ‘false name’ in Greek, and a famous example is the fictional character Bruce Wayne, who sometimes goes by the name Batman. Similar to Bruce Wayne and Batman, pseudonymization in IT systems means that you mask the registered and their personal data.

    In GDPR, pseudonymisation is defined as ”the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information.” Personal data is thus exchanged with non-identifying data, and additional information is needed to recreate the original data. Further, the additional information should be kept separately.

    Pseudonymization makes the information such as personal identification numbers and personal data less accessible to unauthorized users, and is a way to comply with GDPR requirements.

    Anonymization:

    Anonymized data refers to data that is made anonymous in such a way that the registered can no longer be identified. You simply remove the possibilities of identifying a person, and no additional information can restore the original information. Anonymization is difficult. You completely lose the connection between data and the individual. Nevertheless, it can be a beneficial technique when the data is used for statistical or research purposes.

    As you may gather, there is a clear distinction between the two concepts. Pseudonymization means that an individual can still be identified through indirect or additional information. This means that pseudonymized personal data is still in scope. Anonymization means that you cannot restore the original information, and such data is out of scope of the GDPR.

    How does pseudonymization and anonymization work in practice?

    There are different techniques you can use to pseudonymize, but also anonymise personal data.

    Directory Replacement

    Directory replacement means that you modify data about the registered, while there is still a link between the values. For example, you can use a customer number to identify an individual, and store information that directly identifies an individual, such as personal identification number, separately. In this way you pseudonymize the sensitive data. To anonymise, you should delete the separate sensitive information that directly identifies the registered.

    Scrambling

    In simple words, scrambling is when you mix letters, and some examples of scrambling techniques are encryption and hashing.

    Masking

    Masking means that some of the information is hidden using random characters or other data. Masking techniques are widely used in the payment industry and card data processing, where parts of the card number are masked, not the least to comply with PCI DSS.

    More blogs