WordPress.org

Make WordPress Core

Opened 20 months ago

Closed 13 months ago

Last modified 13 months ago

#43175 closed enhancement (wontfix)

Discussion - Pseudonymisation

Reported by: xkon Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: Privacy Keywords:
Focuses: Cc:

Description

I'm opening up this ticket as an area of discussion to try and find out if there's anything needed or will be done in the future regarding 1 specific area of the GDPR. Specifically for Pseudonymisation. In my eyes the paragraph below seems to be in need of more attention than explaining to the user what data we are collecting on any given site.

As it is stated at the moment on
https://en.wikipedia.org/wiki/General_Data_Protection_Regulation#Pseudonymisation

Pseudonymisation
The GDPR refers to pseudonymisation as a process that transforms personal data in such a way that the resulting data cannot be attributed to a specific data subject without the use of additional information. An example of pseudonymisation is encryption, which renders the original data unintelligible and the process cannot be reversed without access to the correct decryption key. The GDPR requires that this additional information (such as the decryption key) be kept separately from the pseudonymised data. Pseudonymisation is recommended to reduce the risks to the concerned data subjects and also help controllers and processors to meet their data-protection obligations (Recital 28).
Although the GDPR encourages the use of pseudonymisation to "reduce risks to the data subjects," (Recital 28) pseudonymised data is still considered personal data (Recital 26) and therefore remains covered by the GDPR.

After reading that we made some discussions with some law firms (specializing on internet matters) and the answers that we got where pretty much the same and are as follow:

The idea is to either have pretty much all the data encrypted ( for example phone numbers, addresses etc ) or have the website connecting into 2 databases instead of 1 that keeps the data. So you would require access to both databases to identify a person. For example DB 1 keeps the name and password, the 2nd keeps the address and phone. The idea is that if there's a breach it would require both databases to make a full match of a person else the data are incomplete etc.

--

Now since I'm not into law or anything, maybe there's somebody around with a more clear view into what's ( IF ) needed to be done. And if that's the case maybe it should become a matter of discussion since May isn't that far off.

I've seen Plugins and developers in general are already moving and altering things towards being GDPR compliant, but my question is based on if there's something that needs to be done within the core of WordPress itself so it can be 'shipped' GDPR ready ( if not already compliant ).

--

We could also try and gather somewhere all the information regarding WordPress / GDPR to see a more spherical view of the matter and what is changing / needs to be adjusted ( policies etc ).

Change History (22)

#1 @xkon
17 months ago

  • Keywords gdpr added

#2 @xkon
17 months ago

  • Summary changed from GDPR - Discussion - Pseudonymisation to Discussion - Pseudonymisation

This ticket was mentioned in Slack in #gdpr-compliance by xkon. View the logs.


17 months ago

#4 @David 279
17 months ago

Pseudonymisation is important but not in the way you describe.

Under GDPR any personal data is still personal Data even just the IP Address!, so if either DB is not encrypted then just gaining access to DB1 or DB2 allows the person doing so access to some personal data.

If you are going to use 2 Databases the the logical option is to stick all personal data on DB2 and encrypt it with the key stored somewhere else.

Now I'm not a programmer etc, but the issue will be allowing people to still login and post under their own name and letting them check their own details, so you need to be able to encrypt/decrypt on the site, you also need to be able to export the unencrypted personal data when requested, further you need to be able to import and export users from a file as can be achieved currently purely for admin purposes.

#5 follow-up: @David 279
17 months ago

The biggest issue I see at the moment with Encryption of user Data is that the key needs to be on a different server

#6 follow-up: @David 279
17 months ago

Reading the actual regulations (article 32 of the GDPR) you get the following:

"Security of processing"

  1. Taking into account the state of the art, the costs of implementation and the nature, scope, context and purposes of processing as well as the risk of varying likelihood and severity for the rights and freedoms of natural persons, the controller and the processor shall implement appropriate technical and organisational measures to ensure a level of security appropriate to the risk, including inter alia as appropriate:

(a) the pseudonymisation and encryption of personal data;

(b) the ability to ensure the ongoing confidentiality, integrity, availability and resilience of processing systems and services;

(c) the ability to restore the availability and access to personal data in a timely manner in the event of a physical or technical incident;

(d) a process for regularly testing, assessing and evaluating the effectiveness of technical and organisational measures for ensuring the security of the processing.

  1. In assessing the appropriate level of security account shall be taken in particular of the risks that are presented by processing, in particular from accidental or unlawful destruction, loss, alteration, unauthorised disclosure of, or access to personal data transmitted, stored or otherwise processed.
  1. Adherence to an approved code of conduct as referred to in Article 40 or an approved certification mechanism as referred to in Article 42 may be used as an element by which to demonstrate compliance with the requirements set out in paragraph 1 of this Article.
  1. The controller and processor shall take steps to ensure that any natural person acting under the authority of the controller or the processor who has access to personal data does not process them except on instructions from the controller, unless he or she is required to do so by Union or Member State law.

Initially when reading this you may see the bit about "costs of implementation" and think ah, it's going to cost far too much to implement pseudonymisation and encryption of personal data within WordPress, we can ignore this, however in the latest (11th April 2018) PDF on this subject from the Article 29 Working Party (the people who basically decide what the GDPR will implement) one very specific Paragraph caught my attention

There is also a public interest in the implementation of encryption. Securing personal data in transitand at rest is a cornerstone of the trust we all need for digital services, so as to enable innovation and growth for our digital economy.

The whole document is here http://ec.europa.eu/newsroom/article29/document.cfm?action=display&doc_id=51026


Note that CURRENTLY the GDPR does not require people to store Personal Data in an encrypted form, but when you take the time to read the documents especially the one in the link you can see that it is only a matter of time before this becomes a requirement.

Further there is indication that when a loss of data occurs it may be regarded as less serious if the Data itself is encrypted, thus encryption of personal data whilst not mandatory at this time is highly advisable


There are a few issues with Encryption of Personal Data

  1. The Decryption Key should not be stored in the same location as the Encrypted Data, there's no point in installing a great big safe to protect your valuables then sticking a post it note to the side of the safe with the combination, however many people will be hosting their WordPress sites on a Shared Server so just how one sets this up I'm not sure
  1. Individual Users may need to check their own Data.
  1. Access to User Data needs to be strictly controlled, there may be a need to have two levels of access to user Data, at the top level one gains access to modify user data, whilst at the secondary level one can read but not modify user data, this allows employees to for example copy a name and address from an eCommerce system into a couriers system to fulfill an order. Users should possibly be able to modify their own data, I say possibly because in an eCommerce system a User should not be able to modify their Name and Address as this would alter Transactional Data, a checkbox in the back end might be provided to check user editing of their own account.

Make of the above what you will

#7 in reply to: ↑ 6 ; follow-up: @ericdaams
17 months ago

Replying to David 279:

  1. The Decryption Key should not be stored in the same location as the Encrypted Data, there's no point in installing a great big safe to protect your valuables then sticking a post it note to the side of the safe with the combination, however many people will be hosting their WordPress sites on a Shared Server so just how one sets this up I'm not sure

Hard to see how WordPress core can do this out of the box. Perhaps something like Jetpack/Akismet could fill this gap as a 3rd party service securely hosting the Decryption Key, but that seems contradictory to the open source, "own your data" nature of WordPress as it exists today.

#8 in reply to: ↑ 7 ; follow-up: @David 279
17 months ago

Replying to ericdaams:

Hard to see how WordPress core can do this out of the box. Perhaps something like Jetpack/Akismet could fill this gap as a 3rd party service securely hosting the Decryption Key, but that seems contradictory to the open source, "own your data" nature of WordPress as it exists today.

Oh I agree but this is what the working party on GDPR are saying, so the question is whether or not it's possible to achieve in Wordpress if you are not hosting on your own server.

I was discussing this last night with someone where they were saying that they host their application on one server that was internet accessible, the encrypted user data was stored on a second server only accessible from the first and the encryption key for the user data was stored on a third server

I don't believe that one needs to go to the length of having user data on a separate server (possibly a separate database) but the requirement to have the decryption key stored elsewhere is very clear

#9 in reply to: ↑ 5 ; follow-up: @iandunn
17 months ago

Replying to David 279:

The biggest issue I see at the moment with Encryption of user Data is that the key needs to be on a different server

Can you cite the section of GDPR that says it needs to be on a separate server? I couldn't find it, and am curious to read the details.

In addition to separate servers not being practical from Core's perspective (comment:7), I'm also skeptical of how much security would be gained added. If an attacker finds a vulnerability that allows them to modify the database, but not the filesystem, then in most cases they can just change the password of an existing admin, log in, and upload a malicious plugin.

If they find a vulnerability where they gain access to the file system but not the database, then they can easily grab the database credentials from wp-config.php and make queries through PHP.

#10 in reply to: ↑ 8 ; follow-up: @iandunn
17 months ago

Replying to David 279:

they host their application on one server that was internet accessible, the encrypted user data was stored on a second server only accessible from the first and the encryption key for the user data was stored on a third server

Firewalling the database server behind a DMZ is a good practice, and seems fairly common, but I'm curious to hear about the third server that stores the encryption key.

I'm assuming that the web server makes some kind of request to the key server, and uses some credentials for authentication/authorization. It seems like if the web server was compromised, then the attacker would gain access to those credentials, and therefore have access to the encryption key as well. If that's true, then it doesn't seem like the 3rd server offers any meaningful protection.

Is there something I'm missing?

#11 in reply to: ↑ 10 @David 279
17 months ago

Replying to iandunn:

Is there something I'm missing?

I have no idea I'm not a software expert and as I said the lengths they had gone to seemed excessive

#12 in reply to: ↑ 9 @David 279
17 months ago

Replying to iandunn:

Can you cite the section of GDPR that says it needs to be on a separate server? I couldn't find it, and am curious to read the details.

In addition to separate servers not being practical from Core's perspective (comment:7), I'm also skeptical of how much security would be gained added. If an attacker finds a vulnerability that allows them to modify the database, but not the filesystem, then in most cases they can just change the password of an existing admin, log in, and upload a malicious plugin.

If they find a vulnerability where they gain access to the file system but not the database, then they can easily grab the database credentials from wp-config.php and make queries through PHP.

Annoyingly I can't find the document I had last week stating that the encryption key should not be kept in the same location as the encrypted data

This is not in the GDPR YET but is being looked at by the group that keeps the GDPR as an ongoing thing, so it doe not need to be sorted by May 25th but given the numerous data breaches over the last few years I think you can safely be confident that this will filter through fairly quickly

#13 @horninc
17 months ago

I think what WP29 is referring to is more of KMS integration possibilities. Key Management Services are quite common and quite easy to implement.

It is well understood, that compromising a "server chain" will indeed result in a data breach. This is, however, irrelevant for the given discussion. The bottom line is that encryption key should be stored outside web server scope (for instance), so compromising apache/nginx would result in a leak of data, but that data would be encrypted and key safe. Or that breach of database server would result in data leak, but not of the encryption key which was protected outside the scope of .sql dump and/or simple query access.

It naturally follows that a complete root-level access hack would not protect it fully in any event, but that is also not the core aim of the Regulation.

This ticket was mentioned in Slack in #gdpr-compliance by xkon. View the logs.


16 months ago

#15 @allendav
16 months ago

I advocate for keeping this broad ticket open and use it to organize some specific use cases - e.g. this pseudonymization use case for email addresses: #44078

#16 @desrosj
16 months ago

  • Component changed from General to Privacy

Moving to the new Privacy component.

#17 @summoner
16 months ago

Not so easy to find a proper solution:

On one hand i would delete personal data only as a last resort and only in such cases when it is compulsory to do so /see Article 17 (1) a)-f) considering even exceptions noted in (3) b) and e) /

In any other case i would suggest just pseudonymisation or encryption of the data mainly because the subject should be able to make themselves reidentifyable as it stands in Article 11. (2). So if the controller does not store the encryption key anymore but the subject provides additional info to identify themselves, they should be reenabled to exercise their rights to access or rectify their data, right to erasure, right to restriction of processing and right to data portability.

On the other hand the data controller must also assure proper level of security of processing /Article 32 (1) a)/ and as storing hashkeys separated from the hashed data might be too complicated, maybe deleting personal data is more preferable in most cases. I mean if there are no obligations to keep billing data for X years in case of online shops for example.

However if data deletion will be preferred, then someone who has been banned by an admin before just can request the deletion of their data, and re-register themselves with the same email address as before the ban. Just because in that case not even a hashed version of the actual email address will be kept and so there is no means compare that in case of such a re-registration. Later will challange some admins for sure...

Last edited 16 months ago by summoner (previous) (diff)

This ticket was mentioned in Slack in #gdpr-compliance by desrosj. View the logs.


16 months ago

#19 @desrosj
15 months ago

  • Keywords gdpr removed

Removing the GDPR keyword. This has been replaced by the new Privacy component and privacy focuses in Trac.

This ticket was mentioned in Slack in #core-privacy by desrosj. View the logs.


13 months ago

#21 @idea15
13 months ago

  • Resolution set to fixed
  • Status changed from new to closed

Great discussion Konstantinos.

The thing about pseudonymisation is that it's not a one size fits all solution or requirement. It depends on the situation it's being used for, and the nature of the data. The guidance on when to use pseudonymisation is very useful for situations like academic research, big data, and situations where you might be aggregating information from thousands of data points. For your average WP database, not so much.

What I have been advising people is that the greater the chance that pseudonymised data can be matched with other data to be identified, the greater the risk of using it. Making it too easy to pseudonymise data, through an automated tool for example, would play into that.

I'm also not sure on an example of where an average WP admin would need to use a pseudonymisation tool. If you are going to be doing that, it's going to be on a raw database export, not within the WP back end. (Would love to hear differing opinions or imaginings of how it could work.)

So there's certainly scope to create pseudonymisation tools and functions, but IMHO it would be overkill to put them into core.

#22 @SergeyBiryukov
13 months ago

  • Milestone Awaiting Review deleted
  • Resolution changed from fixed to wontfix

Changing the resolution, as there were no commits here.

Note: See TracTickets for help on using tickets.