Make WordPress Core

Opened 6 years ago

Last modified 5 years ago

#43492 new enhancement

Core Telemetry and Updates

Reported by: xkon's profile xkon Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: Upgrade/Install Keywords: 2nd-opinion
Focuses: privacy Cc:

Description

It has been discussed on the #gdpr-compliance and the question had made it's cycle around some other rooms with various replies as well occasionally but it's time to take a final decision.

When WordPress requests updates it sends more than enough data needed to actually perform the update.

We did a search a bit at some point to take a glimpse of the past and how all those got added in there and why but couldn't find anything specific.

There's already a way to modify the call through https://developer.wordpress.org/reference/hooks/core_version_check_query_args/ (and may be more as well) but the concept here is to have everything off by default and add a proper UI and opt-in for the Admins to select what exactly they want to send.

Change History (72)

#1 @lakenh
6 years ago

Similar to this plugin perhaps, that was shared in the #gdpr-compliance channel on Slack:

https://en-gb.wordpress.org/plugins/update-privacy/

#2 @xkon
6 years ago

  • Keywords gdpr added

#3 @mattrad
6 years ago

See #16778 for history and discussion around this issue.

I'd be more than happy if I had to retire my Update Privacy plugin :)

#5 @kevinwhoffman
6 years ago

It would be helpful to clarify what, if any, information that is currently available in the Advanced View of a Plugin page would be affected by requiring an opt in.

https://wordpress.org/plugins/jetpack/advanced/

Active installations
Active versions
Downloads per day
Active install growth
Downloads history

This ticket was mentioned in Slack in #core by xkon. View the logs.


6 years ago

#7 @danieltj
6 years ago

  • Component changed from General to Upgrade/Install
  • Keywords 2nd-opinion added

I'm not a GDPR expert here by any means, but the data sent in the request wouldn't class as personal information and as such, doesn't matter if it's sent to WordPress.org, does it? Either way, this is integral in ensuring that the site gets security updates (and other feature updates).

This can be somewhat remedied by disabling updates for your site, in which case the request data doesn't need filtering because it's never sent? Or is it sent first but the update is blocked? I haven't checked how Core fully does updates from start to finish.

The biggest problem I see with this is a big rewrite on how updates are performed because ideally you want to prevent an update site-side before the data is compiled and sent to WordPress.org's API to check the request data to see if the site is eligible for the update. This is a very tricky issue for sure.

#8 follow-up: @xkon
6 years ago

@danieltj I can easily argue about 'personal data' in the way of: it's my server, my localhost/pc so yes PHP version is my personal data basically as it's on my personal computer, you have to inform me that you want it.

As for the updates themselves, if these data are actually needed for the update to happen for some reason that is unclear to me at the moment, the idea of the Regulation and the whole Privacy is to inform me first so I can decide on my own to at least just cancel the update process if I like. If you tell me that I can already define that on my wp-config, we have to make it possible for the Dumb & Dubmer pretty much so wp-config is out of the question :D and it should be given by a simple on/off switch within the admin area.

The point & difference here is that all of these are already happening with either hooks or plugins or anything else you can think of. Why not happen within Core itself to be 100% perfect and give the end user the functionality / reassurance needed?

#9 @dd32
6 years ago

Strictly speaking, of the values reported to WordPress.org as part of the core update check (other than the WordPress version), these are required for a proper response:

  • PHP Version - So as to not offer WordPress 9.0 which requires PHP 7.0 to WordPress 7.5 users who are only running PHP 5.6.
  • MySQL Version - Same as above, Last time we increased PHP 4.3 -> PHP 5.2 min, we also bumped MySQL from 4.1 to 5.0
  • local_package - Used to offer locale-based installs updates which include their locales customisations

I note that https://wordpress.org/plugins/update-privacy removes these fields, and core continues to work, it gracefully downgrades and just returns all updates & ignores locale-updates.

As a question was asked about the Active Installs numbers, the main thing relevant to the discussion there is that the Site URL from the WordPress user-agent is used (along with other static information) to determine a unique ID for stats purposes, and is used consistently over core/plugins/themes API's to generate the active_installs numbers for plugins/themes as well as using the other details sent with core update requests to ultimately generate the stats visible on https://wordpress.org/about/stats/
The aggregated statistics part is covered by the WordPress.org Privacy Policy.

#10 @xkon
6 years ago

@dd32 thank you for all that information.

So I'll step back a bit and give this question towards everybody in the community as we have a concept about the data:

I'm introducing you the to the non-tech savvy everyday user that has concerns about his privacy or telemetry etc and has no idea what apply_filters() means when reading the Doc on core_version_check_query_args, also he does not want to relly on 3rd party ( plugins etc ) installations [ we've all seen these kind of users/comments on forums ].

That filter has to be something given within a UI along with the core as it is already part of it with a simple on/off switch so anyone can easily choose what he likes. Of course the user will be thoroughly informed of what those data are and what switching off means + affects as this might break the updates and everything else that might be included with it, that is a choice that the user has to make though for his website not us.

Our responsibility is to provide the tool and allow the option freelly, the use of it is on their own hands.

Can we add this to our list and do it? From one hand it's a thing that has to be done, from the other it's not 1 individuals decision to just flip a switch as we're a community and collective thinking is the way to go _ .

--

Note: Sorry if I sometimes seem too blunt or strict but I'm not a lawyer as well not even close, for the last months unfortunately for me ( :P ) I've been dealing with lawyers every day mostly in the company I'm in to get ready for the 25 of May and some stuff needed are above-impossible as we can't just flip a switch and just make everything perfect magically. I can't go through all the conversations of what's being said, but I'm quite familiar with what needs to be done.

#11 in reply to: ↑ 8 @danieltj
6 years ago

Replying to xkon:

@danieltj I can easily argue about 'personal data' in the way of: it's my server, my localhost/pc so yes PHP version is my personal data basically as it's on my personal computer, you have to inform me that you want it.

Just because the information exists on your computer/server doesn't mean it's persona data though. Through the eyes of GDPR, personal data is only data that personally identifies you as a person. So xkon can of course be thought of as personal data as it's your username on WordPress.org. However, an unknown site running WordPress 4.9.4 and PHP 5.6 is not personal data, irrespective of opinion because you cannot use that data to personally identify someone with. Anyone in the world ( 25% of the web ;-) ) could have that data as it's so broad and general, it doesn't really mean anything. That definition isn't from me, but from a trained professional that I listened to at a talk.

However, I do agree that perhaps there should be something to alert users that data is sent, but not necessarily a switch to turn it on or off. There's two reasons for this;

  • You can use the core_version_check_query_args to add/remove data that is sent, however removing enough data here will cause any potential upgrade check to fail if it doesn't have enough data to verify an update is needed.
  • You can also use the AUTOMATIC_UPDATER_DISABLED constant to disable all automatic updates and WP_AUTO_UPDATE_CORE to disable all site updates all together.

I agree that using a plugin for this may be a bit overkill for something so small, but on the other hand, you can put these functions in your themes functions file or inside wp-config.php.

The next steps here should be about telling users what is collected and why and ensuring people know that personal data is left out of the information that is sent to Dot Org. That seems like the best way forward from here.

#12 @casiepa
6 years ago

Just for reference, this is what we have as current info on what is collected:
https://github.com/gdpr-compliance/info/blob/master/Synched-info.md#coreversion-check-all-stored-except-for-those-marked

To be reviewed if still valid for the current version of WP.

#13 @xkon
6 years ago

@danieltj I can't for the sake of my sanity ( :D ) go through what anyone thinks and understands as personal data.

I can say this: You are focusing on GDPR ( maybe we should go further than that and we will at some point ) because everything at the moment goes around it. There's also ePD and an extra ton of different stuff to consider that nobody has for years now as they are completely legal (laws, papers etc) and nobody had time to check or the idea that they even existed. So not everything is 'just' about GDPR.

GDPR came as a slap because there where other laws and regulations in place for years now that everybody ignored. So somebody decided 'enough' everybody has to get in-line, for whatever reason that might be either we like it or not.

Point: In some countries it's a fact, everything is personal data and goverments per country will be able to adjust GDPR to an extent that they see fit for their own countrys/peoples interest, we have no idea yet what each country will bring to the table of course until that's done.

So basically the talk you heard has nothing to do with the talks that we hear or the talks in another country so on so forth if you get the idea :D, as they speak with different standards even if the major label is GDPR. We have to include 'everybody' unfortunately not what individuals per country / law system say. If 50% says green and 50% says red, in my opinion we have to bring Green + Red as options to help everyone if possible.

--

I won't force anybody into this mindset, I'm not even forcing myself to be honest :) and that's why all the questions. At the end of the day I could just upload a patch for proposal to save myself from keyboard overload and if it gets turned down then so be it, but I don't like that approach :D .

As for the wp-config.php you mentioned, I stated on my previous comment that the idea is to give everyday users a way out of techy things I don't know why you're pointing to that direction again.

#14 follow-up: @DavidAnderson
6 years ago

Through the eyes of GDPR, personal data is only data that personally identifies you as a person.

This is incorrect, AFAICT. Personal data under the GDPR is all data *associated with* a person.... not just the data which is identifies you.

https://gdpr-info.eu/art-4-gdpr/

‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person

#15 in reply to: ↑ 14 @azaozz
6 years ago

Replying to DavidAnderson:

This is incorrect, AFAICT. Personal data under the GDPR is all data *associated with* a person.... not just the data which is identifies you.

OK, how is the version of PHP running on GoDaddy's servers associated with a person? Who is that person anyway? :)

IMHO the information sent from WP when checking for upgrades is for "Essential security purposes". Plugins that "fiddle" with it should be considered harmful not only for the sites they are installed on but for all of the Internet, as they may make WordPress less secure. A hacked site on a shared server is a problem for all neighbor sites.

We already have a disclosure about what is sent in these checks. Can make it more prominent by showing it when installing (as discussed in Slack).

#16 follow-up: @DavidAnderson
6 years ago

@azaozz Don't shoot the messenger. GDPR says that all data which is linked to a directly, or indirectly, identifiable person, falls under its definition of personal data. This is the law in Europe from May, whether we like it or not. Sites which belong to individual persons (e.g. personal websites, sole traderships), and which are identifiable (e.g. send their URLs in the referer string, or via a reverse IP look up), arguably are included within the meaning of "can be identified ... indirectly" and as such all such collected data must be GDPR-compliant.

Disclosure is only relevant for data legally collected... if it's not legally collected according to the GDPR, then all the disclosure in the world makes no difference. The GDPR as I understand it also distinguishes different purposes for processing the same data. Data collected for security purposes cannot be used for aggregating statistics just because it happened to be hanging around.

I am not a lawyer and hence offer no legal position on whether the WP foundation would have a good case for arguing that 100% of the data being sent in updates checks is essential for site security. Prima facie that seems a very hard case to make given that some of it has (AFAIK) never been used in over a decade, but ultimately that's for the foundation and its lawyers to take a view on, I suppose.

Last edited 6 years ago by DavidAnderson (previous) (diff)

#17 in reply to: ↑ 16 @azaozz
6 years ago

Replying to DavidAnderson:

@azaozz Don't shoot the messenger.

Hehe, of course! I mean this as a joke. It would still be pretty interesting how one of the couple of hundred thousand people having websites on GoDaddy can be identified by the PHP version on the GoDaddy's servers :)

GDPR says that all data which is linked to a directly, or indirectly, identifiable person, falls under its definition of personal data.

That would mean all data on the Internet will have to be anonymized or deleted right now as it is linked to at least one directly or indirectly identifiable person, possibly more :)

I'm not a lawyer either but GDPR also says that websites can collect, store and use any personal data for legitimate purposes like preventing fraud, complying with reporting requirements, security, etc.

#18 @DavidAnderson
6 years ago

The key issue with GDPR as I understand it is explicit and informed consent: that data processing can only legally carried out for the purposes for which that explicit/informed consent was given. Under the GDPR, people don't consent to give their data.... they consent to give their data for explicitly defined purposes, and use of that data for another purpose is, in the general case, unlawful.

#19 @idea15
6 years ago

Can we come back to this ticket in the next office hours? It went a bit into the weeds but the issue still stands.

#20 @xkon
6 years ago

I was going to submit a small patch that adds a notification area during the installation about GPL + what WordPress sends to wp.org regarding updates etc, but sure we could talk about it on the meeting as well. I'll wait for it before doing something final.

#21 @DavidAnderson
6 years ago

If adding a message to a notification area is something being suggested in order to satisfy the GDPR's requirements on processing of personal data, then there's not much overlap between that and the GDPR's requirements (see: https://medium.com/@sagarag/how-to-design-gdpr-compliant-consent-b5d6cf28d0c5 ). (I personally am not a lawyer and have no opinion on whether the core updates check is a GDPR personal data issue or not - I'm just saying...).

#22 follow-up: @idea15
6 years ago

We've already established that the dataset sent back by telemetry checks (https://github.com/gdpr-compliance/info/blob/master/Synched-info.md) constitutes personal data. There are any number of issues which could arise from interception and misuse of that data.

The question we need to work for the purposes of this ticket is which telemetry data is specifically necessary for security updates under the defintion of not requiring consent (per the draft ePD, not GDPR) and which telemetry should require active opt-in upon installation. Matt's plugin (above) and 16778 are both worth examining.

#23 in reply to: ↑ 22 @azaozz
6 years ago

Replying to idea15:

We've already established that the dataset sent back by telemetry checks (https://github.com/gdpr-compliance/info/blob/master/Synched-info.md) constitutes personal data.

This is generally data about WordPress (the software) and the hosting company (the server). For clarity, could you point out which data is considered "personal"? For example, what is personal in the "core update" check:

  • Site URL - freely available from the domain registry
  • Site IP address (this is the server's IP address, not a "personal" address of any of the site's visitors) - server/hosting company info
  • WordPress version
  • PHP version - server/hosting company info
  • MySQL version - server/hosting company info
  • Locale
  • Number of sites (i.e., on a multisite install) (not stored)
  • Number of users (not stored)
  • Whether multisite is enabled or not
  • On Multisite installs, the URL of the parent blog (i.e., the parent blog of pento.blog is wordpress.com)
  • Initial DB version (corresponds with the version of WordPress that was initially installed for this site) - this is the same as "WordPress version" above
  • Report data on whether a site updated successfully or not

Think we have touched an interesting question here: is the information about an internet site "personal data"? This is not information about who owns the site, who made it, who pays for it, who visits the site, etc. If we were sending the admin's email address to wp.org when checking for updates (which BTW would be nice in cases when an update fails), I'd agree that should be opt-in.

I think that the focus here should be about informing and educating the site owners about what data is used when checking for updates, not forcing them to choose something when most of them cannot make an "informed decision" about it.

#24 follow-up: @DavidAnderson
6 years ago

@azaozz I think you might be using a 'personal' definition of 'personal data'. Under the *GDPR's* definition, it is not the case that some data is 'personal' and other data isn't. *All* linked data is personal as soon as *any* of it can be _associated_ with an _identifiable_ person - see: https://core.trac.wordpress.org/ticket/43492#comment:14 . (And things like whether it can be obtained elsewhere or not is not relevant).

The key from a GDPR POV, as I understand it, is that the GDPR doesn't offer exceptions for things like "if the user decided to put their name in the URL or on the 'About' page of their website, that's their problem". It's reasonably forseeable that many URLs sent in updates checks are "identifiable" with a person (in distinction to, e.g. if they decided to put it in the name of a customised plugin); generally, the GDPR's rules all apply *without any distinction based upon how you obtained that data* (or whether it was already public by some other means, etc.). These attributes are the deliberate intention of the GDPR; it's not loose wording or a mistake; that result is exactly what they are trying to achieve.

N.B. WHOIS may be killed by GDPR, according to many news sources.

It's not clear to me that, if the URL was omitted, there'd be anything to make it personal data, and if the data wouldn't then be covered by the GDPR's anonymization provisions.

Update: https://medium.com/@subsign/google-analytics-and-gdpr-compliance-3fad792babf5 says that the GDPR considers an IP address as PII (personally identifiable information). That makes sense if the GDPR's "indirectly" (for methods of identifying) can include non-public things (e.g. a hosting company provides assistance with turning IP address + plugin combinations back into a website and thus into a person (though probably with just the IP address and the site install info anyone could do that anyway via using the various public reverse "what is hosted on this IP?" lookup tools)).

Last edited 6 years ago by DavidAnderson (previous) (diff)

#25 in reply to: ↑ 24 @azaozz
6 years ago

Replying to DavidAnderson:

Under the *GDPR's* definition, it is not the case that some data is 'personal' and other data isn't. *All* linked data is personal as soon as *any* of it can be _associated_ with an _identifiable_ person - see: https://core.trac.wordpress.org/ticket/43492#comment:14 . (And things like whether it can be obtained elsewhere or not is not relevant).

Yep, I understand. That's why I asked which of the above data is considered "personal".

It's not clear to me that, if the URL was omitted, there'd be anything to make it personal data, and if the data wouldn't then be covered by the GDPR's anonymization provisions.

I'm not sure that's possible. Many sites are hosted on shared servers and share the same IP address.

However, if domain names are considered "personal data" under GDPR, that means everything on the Internet is "personal data" as everything is associated with a domain name :)

#26 follow-up: @DavidAnderson
6 years ago

@azaozz I'm not sure if your thinking is that the GDPR couldn't possibly require anything onerous, and that if it seems to imply anything onerous, then we must be reading it wrongly? On the contrary... (as I said, WHOIS is likely to soon end as far as EU citizens' data is concerned).... being onerous is not an unanticipated side-effect; it's *intended* to be a revolution, and to place heavy burdens on data processors, to force them both to not process anything that they really don't need to, and to make sure that what they do process, they do according to strict and onerous rules (which is why there's been 2 years for implementation since its requirements were published). N.B. I'm neither defending nor opposing the GDPR, or the work that compliance with it entails; just trying to be accurate about what the reality of the situation.

So, yes, if 1) you are an entity subject to the GDPR (like the WP Foundation is), and if 2) you harvest and process data that you scrape from a website and connect it in your data store in such a way that it can, through direct or indirect means, be identified with an individual EU citizen (e.g. the website has an 'About' page that identifies a specific individual), then indeed the GDPR *does* bear upon that. "But that's a really onerous requirement" doesn't matter; the GDPR's POV on that - for better or for worse, whether I like it or not - is "yes, that's what we wanted to accomplish, it seems like things are going well."

N.B. The wording of the GDPR concerning being 'associated' isn't that something is associated in some or any general way. It's about what data you *store* and/or *process*. If you don't store the URL (or anything else that comes into the PII category), then I believe you can process the content of any website to your heart's content without any GDPR implications.

Last edited 6 years ago by DavidAnderson (previous) (diff)

#27 in reply to: ↑ 26 @azaozz
6 years ago

Replying to DavidAnderson:

So, yes, if 1) you are an entity subject to the GDPR (like the WP Foundation is), and if 2) you harvest and process data that you scrape from a website...

WordPress doesn't harvest or scrape any data from any website. WordPress provides a service to update the software people and companies use to power their websites.

Anyway, this ticket is about sending information about the site, WordPress (the software) and the hosting environment (the server) when checking for updates. As with everything else in WordPress, site owners can opt out of this service by using a plugin. I personally think this is a very bad idea as that will make their sites vulnerable to attacks. But people are free to make their own decisions.

I'll repeat myself in saying that IMHO the focus here should be on informing and educating the site owners about what data is used when checking for updates, not forcing them to choose something when most of them cannot make an "informed decision".

Last edited 6 years ago by azaozz (previous) (diff)

#28 follow-up: @DavidAnderson
6 years ago

WordPress doesn't harvest or scrape

In that section, I was responding to your hypothetical last sentence, not to the specific issue in this ticket.

As with everything else in WordPress, site owners can opt out of this service by using a plugin.

If the intention is to comply with the GDPR, then the GDPR requires explicit informed consent for all PII (which includes all URLs and IP addresses that can eventually be traced to an individual). Opt-in-by-default, and requiring explicit opt-out action, are specifically prohibited.

(Which, interestingly, makes it like the wordpress.org plugin directory's rules for remote HTTP requests by plugins).

I'll repeat myself in saying that IMHO the focus here should be about informing and educating the site owners

You speak as if the GDPR is modelled upon, or at least fundamentally similar in intention to, previous data protection regimes. It isn't. By design, it's radically and intentionally different, whether we like it or not. I do not work for the WP Foundation and it is not my business to say whether the WP Foundation should comply with EU laws or not. Though, if the final decision is to not comply, or to comply with what the WP Foundation wishes the GDPR had been (rather than what it actually is), this will certainly cause a lot of problems (hopefully fixable by plugins, but requiring the community to specifically research/learn and implement this site-by-site would be bad).

Last edited 6 years ago by DavidAnderson (previous) (diff)

#29 in reply to: ↑ 28 @azaozz
6 years ago

Replying to DavidAnderson:

As with everything else in WordPress, site owners can opt out of this service by using a plugin.

If the intention is to comply with the GDPR, then the GDPR requires explicit informed consent for all PII (which includes all URLs and IP addresses that can eventually be traced to an individual). Opt-in-by-default, and requiring explicit opt-out action, are specifically prohibited.

Think there may be some misunderstanding here. I'm not talking about giving or withholding consent of sending a site's URL to another site (this actually happens every time somebody follows a link on the internet). I'm talking about discontinuing a service that helps to keep a particular website secure.

As mentioned in previous comments, there are specific rules in the GDPR concerning information needed for security reasons. They seem to apply in this case.

I do not work for the WP Foundation and it is not my business...

I do not work for the WP Foundation either. WordPress is an open source project, and everybody here is contributing to it :)

I'll try to explain my point once more:

  • It will be really foolish to force site owners into making a decision about keeping their websites secure without giving them enough information so they can make an informed decision.
  • For that reason think this ticket should focus on providing that information, including what data is sent on update checks, how it is used, and what it would mean for their site if these checks are disabled.

I'd also really like to hear a lawyer's opinion on whether domain names and websites IP addresses constitute "personal data" under the GDPR. If anybody knows of any such opinions that are posted somewhere, linking them here would be very helpful.

Last edited 6 years ago by azaozz (previous) (diff)

#30 @xkon
6 years ago

My intention/thinking of adding the License/Data information during installation was with the idea:

Since we don't know exactly at the moment what is needed and what is not and what might break updates by reading an informative text of what WordPress sends during updates there is basically a 2 way option, 1] you install it so you're automatically accepting the mentioned or 2] you don't install it.

This might seem harsh and against our friendly community and the whole WP idea etc but hey, it's software at the end of the day and we're talking about regulations from now on. Up until today that's how all installations are working afaic:

1] you either click Next after reading the License & Agreement
2] Cancel ( no install )
3] Install and 'choose' if you want to send out telemetry data.

No 3. is something that we can't seem to figure out at the moment so we're left with 1 + 2 unfortunately.

There's no point of an opt-in for the data gathered during updates atm since nobody is sure about them or at least I haven't seen an 100% definite reply up to now. Even though the above mentioned plugins do exist and do handle it, I wouldn't feel comfortable combining anything into core without being 100% sure. Updates recently broke down, let's try to avoid that from happening again especially with something privacy-concerning.

If you see it this way then yes 'just an informative' message goes a long way towards being compliant, than either not being or having broken installations out of the blue.

If the above thoughts seem bold I can assure you that personally I'm the type of guy that doesn't even care what data are gathered and will not even in the future to be totally honest as I don't have anything to hide and I don't mind so my installations will continue to get updates/send data/have telemetry call it whatever :P .

All of this is to actually protect WordPress from users as well not only to give users the options needed :) .

This ticket was mentioned in Slack in #gdpr-compliance by xkon. View the logs.


6 years ago

#32 @allendav
6 years ago

@azaozz wrote:

For that reason think this ticket should focus on providing that information, including what data is sent on update checks, how it is used, and what it would mean for their site if these checks are disabled.

+100

I'd also really like to hear a lawyer's opinion on whether domain names and websites IP addresses constitute "personal data" under the GDPR.

I really don't like abstract discussions - I think they go on for ever. So, here's the only way I can think that a server IP could be used to identify a person - what if a person keeps the server in their home or their small business (and not in a hosting provider like bluehost). If a user does that, and their server contacts WordPress.org to check for updates, WordPress.org has the means to "unmask" (identify) that user you even if they have things have privacy on their domain registration.

So... my $0.02... the privacy docs for WordPress core should disclose that 1) WordPress.org servers will be contacted to check for updates and 2) that the server IP is unavoidably shared with them when that happens and 3) that should be opt-in/out before communication happens since that communication can unmask a user

Can someone make a case for opt-in vs opt-out for this? That's where things are not clear for me.

#33 @robscott
6 years ago

Maybe now I should disclose my 3 years of legal education... which certainly do not qualify me to anything other than passing commentary.

Just to be clear about opt in vs opt out. If the "opt in" is relating to personal data, then the GDPR specifically says:

"Consent should be given by a clear affirmative act establishing a freely given, specific, informed and unambiguous indication of the data subject’s agreement to the processing of personal data relating to him or her, such as by a written statement, including by electronic means, or an oral statement. This could include ticking a box when visiting an internet website, choosing technical settings for information society services or another statement or conduct which clearly indicates in this context the data subject’s acceptance of the proposed processing of his or her personal data. Silence, pre-ticked boxes or inactivity should not therefore constitute consent. "

So if the consent is "opt out" then we should shelve it if it is personal data and the ticket relates to GDPR.

My personal view would be this test:

1 is this personal data? Yes or no.
2 If yes - do we need to store it?
3 If we do not store the data, I don't feel we need to obtain consent. (opinion!!).
4 If the data is stored - why? What is it stored for? (beyond the actual transaction I mean)

We (might) need consent for the "why" - what is being done with this (potentially identifiable) and (potentially) personal data?

Using the data for the purposes of processing the transaction (ephemeral storage) is not storage. The only way this data can be considered personal data would be if it were collected together as a package and stored. Again, all my opinion.

#34 follow-up: @DavidAnderson
6 years ago

@robscott

3 If we do not store the data, I don't feel we need to obtain consent. (opinion!!).

Well, the paragraph from the GDPR which you just quoted clearly states that "processing" is the boundary (and makes no reference to "storage"). As such, "Using the data for the purposes of processing the transaction (ephemeral storage) is not storage" is not germane, because the GDPR doesn't divide at that point, and does specifically identify "processing".

Last edited 6 years ago by DavidAnderson (previous) (diff)

#35 in reply to: ↑ 34 @robscott
6 years ago

Replying to DavidAnderson:

@robscott

3 If we do not store the data, I don't feel we need to obtain consent. (opinion!!).

Well, the paragraph from the GDPR which you just quoted clearly states that "processing" is the boundary (and makes no reference to "storage"). As such, "Using the data for the purposes of processing the transaction (ephemeral storage) is not storage" is not germane, because the GDPR doesn't divide at that point, and does specifically identify "processing".

I'm using that in reference to consent. Consent cannot be opt out. It must be opt in. Its not related to my opinion on storage - which is about WHY is the data being stored? The only way this data can be personal data is as a package. The IP address is what it hinges on. If you shed this element, you've anonymized it.

I can dig out one on "suitably anonymized data" if you'd like.

GDPR doesn’t apply to suitably anonymised data [Article 6, 4(e), Article 25, 1, Article 32, 1(a)].

I would argue that the removal of IP / domain (anything else deemed potentially personal data) would be necessary process to suitably anonymize the data. For which, no consent is necessary.

Not really sure we need a semantic argument on this point. I just made it poorly?

Last edited 6 years ago by robscott (previous) (diff)

#36 @DavidAnderson
6 years ago

@robscott Does it hinge on the IP address? I was thinking that the site URL was more problematic. For any individual with a website with an "About" page or other identifying information, that seems much more likely to be PII, under the GDPR's criteria of information that can be directly or indirectly be linked back to a particular individual.

But yes, my own view (IANAL...) is that if the code on the wordpress.org mothership immediately drops the IP address down a black hole, then the GDPR's anonymization criteria then come into play (unless it could be argued that the IP combined with a list of installed plugins is sufficient to de-anonymize - that is possible, as you can look up the IP on any number of public "which sites are hosted here?" sites, and then run a quick automated scan on all the sites to see which ones have that same plugin combination installed. There's probably quite a lot of bits of entropy in the info sent that could allow de-anonymization).

#37 @azaozz
6 years ago

Can someone make a case for opt-in vs opt-out for this? That's where things are not clear for me.

As far as I've seen there is no more opt-out under the GDPR. It is gone for good, eradicated, and banished :)

Every user consent has to be opt-in, and as we well know most users don't bother changing the default options:
https://www.propublica.org/article/set-it-and-forget-it-how-default-settings-rule-the-world
https://www.uie.com/brainsparks/2011/09/14/do-users-change-their-settings/
https://service-design.co/95-of-the-people-stick-to-the-default-option-9e16316a64e1

So the GDPR compliance will introduce a pretty bad UX. The only option would be to keep nagging users to click a checkbox :)

Perhaps we can "work something out" on installation (or first login after installation) where we show a modal and don't dismiss it until the user makes a decision. It'd still be annoying for most, but as @xkon mentions above that's what most (closed source) software does when it asks the user to accept its license.

#38 @robscott
6 years ago

@DavidAnderson the domain and IP I kind of packaged together there... for brevity. The original point stands - if its being stored, as a package, along with IP & domain, consent is probably required.

If consent is required, it must be opt in. It cannot be opt out.

If the data is necessary for security purposes - this does matter in terms of consent etc - it is still necessary to do the other stuff, like identify the data controller.

Ideally, shed the personally identifiable stuff before storing it. Then you are not "controlling" this data. You're using it for the purposes of the transaction. Which is fine. As far as the GDPR and my understanding of it are concerned.

Its worth remembering its about Data Protection. The step of not storing this data (longer than was necessary) would be precisely to protect it. The IP / domain is needed to actually... well do the WordPress update!

It may well be important, if consulting legal teams, if they were trained in the USA or in Europe. Long arguments about the existence of a "Right to Privacy" may otherwise ensue.

#39 @DavidAnderson
6 years ago

The IP / domain is needed to actually... well do the WordPress update!

That's not the case - updates are pulled, not pushed. wordpress.org doesn't need to know or care about which site is asking the question about update availability for specified plugins, in order to advise on that availability. For the URL/domain, you can verify that by editing the updates check routine in WP core to send fake info. For the IP, you can verify it by connecting a site behind a firewall that does not allow incoming connections (e.g. a localhost site) - updates still work.

BTW - back in https://core.trac.wordpress.org/ticket/43492#comment:23 some things are marked as "not stored". Presumably that means that the other things are currently stored? So, wordpress.org have a database that records things like what plugins, and many users, every WP site in the world (that makes updates checks) has, indexed by site URL? Or are some of the things not marked as "not stored" in fact, not stored?

#40 @DavidAnderson
6 years ago

I've looked at the list of tickets tagged for GDPR, but don't see one concerning the right to download all PII held by the WP Foundation. Based on https://core.trac.wordpress.org/ticket/43492#comment:23, it appears that there is PII being stored (at least in the cases of stored website URLs that can be visited to identify a specific person whom the website belongs to). What will be the mechanism for site owners to be able to exercise the GDPR's mandated right to download that PII? Does a new ticket need to be created to track that?

#41 @robscott
6 years ago

Here is what is stored: https://github.com/gdpr-compliance/info/blob/master/Synched-info.md @DavidAnderson - I think you and I are asking the same thing... why? Why is it being stored?

From a GDPR conversation in Slack, yesterday, I am given to understand the answer to this question is "security".

Some clarification on that would be helpful.

Further - regarding data storage, who (data controller) and where (outside the EU?) is this data stored would appear to be very important under GDPR.

But only - only - if this is Personal Data. Which is the root question.

#42 @DavidAnderson
6 years ago

@robscott Is there really still an open question that a large number of website URLs will be classified by the GDPR as PII? The GDPR text says:

‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly

I don't see any ambiguity there. They've written "directly or indirectly" to clarify that it doesn't matter what means are being used to perform the identification (i.e. we can't say "ah, but we'd have to manually browse their website to read their 'About' page to do it, and that's hard to automate") - they've covered that. They don't specify particular types of data - once *any* of the data can lead to identification, *all* the data is then PII ("any information relating to [a person]").

But on your major point - I'm very interested in that too. How does wordpress.org storing (assuming I've understood rightly) your number of users, and site URL, and various other things, and explicitly linking those to your site URL, and storing it all, without anonymization, do anything for security, given that the security updates mechanism in WP is pull (polling) based and has no facility at all for push-based? (The 'automatic' updates on core and sometimes on plugins are still pull/polling based).

The GDPR is explicitly designed to force granularity - it's not a by-product, it's one of their core aims. If you get piece of data A as something necessary for purpose X, then you can't process it for purpose Y - that needs separate/sufficient justification before your allowed to touch it, even if it's stored on your servers and you got it legitimately for purpose X. On my understanding of the WP updates mechanism (code which on the client side I've studied and interacted with at some length), the site URL is never used in the updates response at all. And things like the number of registered users certainly make zero difference to the returned results. So things of that sort surely need explicit opt-in, even if other things are deemed essential to the normal operation of WP (on which I don't have a specific opinion).

Last edited 6 years ago by DavidAnderson (previous) (diff)

#43 @robscott
6 years ago

@DavidAnderson I tend to agree. I think its pretty clear that as a package this telemetry is personal data as defined by the GDPR - but it is necessary to make the distinction, because we are building a system here.

I guess I should explain a bit what I am thinking: maybe an option, in WordPress might say "this is a personal website | business website". A business website - this never applies. Go for your life. Collect and store away... well, actually, not in my view (I say why not be transparent about this anyway, regardless of the legal case?!).

A personal website - that is when the IP and URL etc become... well problematic. By problematic, I think, yes its personal data. So we'd need to know where its being stored, and by whom, etc etc.

But probably, regardless of which framework we are applying, I'd like to know this anyway. It should not be a mystery.

#44 @DavidAnderson
6 years ago

Of interest:

https://wordpress.org/plugins/anonymous-wordpress-plugin-updates/
https://core.trac.wordpress.org/ticket/5066

As a random thought - a centralised database of all WP plugin/theme installs in the world is presumably quite interesting to quite a lot of people, from script kiddies right up to nation states.

Last edited 6 years ago by DavidAnderson (previous) (diff)

#45 follow-up: @robscott
6 years ago

So today I was chatting about this ticket with some colleagues and one of them said the following:

What if for one of my old WordPress sites, the URL was myname.com and I went to WordPress.org under GDPR Right To Be Forgotten asking them to delete their entire record of me?

Thoughts on this?

Further, this user would be entitled to download a copy or his or her data, too.

In this light - some more info would be helpful:

1) who controls the data?
2) where is the data stored?
3) who has access to the data?
4) when is the data destroyed?

#46 in reply to: ↑ 45 @idea15
6 years ago

All four of those questions should be answered in the site's privacy notice - so if it's a concern, carry it over to one of the trac tickets pertaining to that tool.

Also, remember the RTBF is the right to request it. It is not an automatic, universal right to have it done.

#47 @DavidAnderson
6 years ago

@idea15, Rob was talking about the data processed by the WordPress foundation when it receives incoming HTTP requests for updates information - he wasn't talking about end-users of WP self-install getting data out of the those individual installs. There has to be something on wordpress.org, so that people can request PII that is still stored after individual installs are wiped.

Also, remember the RTBF is the right to request it. It is not an automatic, universal right to have it done.

It's true that GDPR does not grant an absolute, limitless right - for example, if someone requests deletion of all their PII, then potentially a) it could instead be anonymized (under the GDPR, that requires that there's no way to reverse the anonymization, however difficult the procedure to do so) or b) another law might require retaining it (e.g. tax records). In the case of the sort of stuff wordpress.org is storing - quite a bit of information on the details of each site install, indexed by URL (certainly PII in many cases) and also IP (which would allow identification in a lot of cases using "what site is hosted on this IP? tools combined with simple scans of installed plugins/themes) - in the case of this info, anonymization or complete deletion would be the only possibilities. There's no general exception to arbitrarily say "no, I don't want to delete or anonymize your PII, so I'm not going to."

In the UK, the authoritative state body for implementation of the GDPR is the ICO. Their guide is here: https://ico.org.uk/for-organisations/guide-to-the-general-data-protection-regulation-gdpr/individual-rights/right-to-erasure/ . There, the permitted grounds for refusing a deletion request because you still need to process the data for a legitimate reason are given as:

  • to exercise the right of freedom of expression and information;
  • to comply with a legal obligation;
  • for the performance of a task carried out in the public interest or in the exercise of official authority;
  • for archiving purposes in the public interest, scientific research historical research or statistical purposes where erasure is likely to render impossible or seriously impair the achievement of that processing; or
  • for the establishment, exercise or defence of legal claims.

#48 @idea15
6 years ago

You don't need to explain how it works in the UK to someone who's been working nearly full time on GDPR in the UK for two years.

The trac tickets are getting bogged down in some of the same discussions we had at the beginning of the Slack channel where people are discussing what things are and how they work rather than what actual work needs to be done on the code level. I'd suggest, as we did in the Slack channel, that those discussions be held elsewhere.

I'd also respectfully suggest that anyone still getting to grips with the fundamentals of data protection, 48 days until the compliance deadline, find an alternate way of contributing. There is simply not the time to run tutorials to contributors at this stage.

Last edited 6 years ago by idea15 (previous) (diff)

#49 @robscott
6 years ago

@idea15 - "All four of those questions should be answered in the site's privacy notice" - but I'm not asking where "the site" covers this. In fact, I'm specifically talking about "sites" which no longer exist. I'm asking, where does wordpress.org store the core telemetry at issue in this post - (for example are those servers outside the EU?).

Where could we learn this to post it into our privacy policy?

"The site" is not storing this data. WordPress.org is.

This issue - the one we are posting on here - is about the relationship of WordPress.org with webmasters in relation to their data which is being collected by WordPress.org in the form of core telemetry. So to be clear: those 4 questions I'm asking; where are they answered by WordPress.org? (or where could they be).

EDIT - at no stage did I suggest you would have a right to be forgotten "done" - but it is necessary for the WordPress Foundation to hear and respect my rights, when I make such a request, and this being the case, it is perhaps sensible to have thought about this in advance. Like now, for example.

Last edited 6 years ago by robscott (previous) (diff)

#50 follow-ups: @robscott
6 years ago

Where api.wordpress.org is collecting core telemetry:

1) who controls the data?
2) where is the data stored?
3) who has access to the data?
4) when is the data destroyed?

I did find this answer, but the stance taken maybe requires some amendment since 2009:

https://make.wordpress.org/core/2009/12/10/suggest-agenda-items-for-dec-17th-dev-ch/#comment-1042

These issues don't concern me overly - never have - but bearing in mind what GDPR is and what it is for; perhaps returning to the answers given back then in light of today's regulatory framework may be a helpful place to start.

#51 in reply to: ↑ 50 @azaozz
6 years ago

Replying to robscott:

Where api.wordpress.org is collecting core telemetry:

1) who controls the data?
2) where is the data stored?
3) who has access to the data?
4) when is the data destroyed?

I have few additional questions :)

  • Can the relations between two internet sites be considered "personal"?
  • As the site owners are both "controllers" and "processors" for the site, would using a service like the one available from wp.org means they are contracting a sub-processor? Then, do they have to:

Obtain written permission from the controller before engaging a subcontractor (28.2), and assume full liability for failures of subcontractors to meet the GDPR (28.4)

#52 @robscott
6 years ago

@azaozz 100% this is the point - it splits at the "is it personal data"; then blooms into a full blown set of similar such questions.

Ideally, over and above GDPR, it could all be set down (these answers about who, what, where, when and why) for transparency purposes. Very simply (something like): "we collect X, Y and Z, and we keep it at 1, 2 and 3; in order to keep stats to make decisions about (for example) what PHP version to support, as well as to make security decisions..." - or whatever would be most appropriate.

Then, having produced this detailed and specific status about core telemetry, asking people to consent to this (if consent is indeed necessary) becomes easier. Because person knows what he or she is consenting to.

The opt in could be connected to turning on automatic updates. Though there is an issue that they should (probably) still be allowed auto updates if user objects to telemetry collection.

I guess a "next step" could be to set this (type of) information down in the Github repo.

I also think worth remembering:

  • Only matters if it is personal data - and the data subject is in the EU.

So we're already on subset of subset.

Ideally - and not for right now - there could be a wider clarification on data and privacy to exceed the GDPR. But this is more of a policy question than a technical one.

#53 in reply to: ↑ 50 @idea15
6 years ago

Above what's clarified in the current wp.org privacy policy, all of this information does exist in an internal data map. It's not really for contributors to say how these questions should be handled internally, but they do need public disclosure.

Replying to robscott:

Where api.wordpress.org is collecting core telemetry:

1) who controls the data?
2) where is the data stored?
3) who has access to the data?
4) when is the data destroyed?

I did find this answer, but the stance taken maybe requires some amendment since 2009:

https://make.wordpress.org/core/2009/12/10/suggest-agenda-items-for-dec-17th-dev-ch/#comment-1042

These issues don't concern me overly - never have - but bearing in mind what GDPR is and what it is for; perhaps returning to the answers given back then in light of today's regulatory framework may be a helpful place to start.

#54 @robscott
6 years ago

Internal to whom @idea15? The Foundation, or some "affiliated organizations"?

"For consent to be informed, the data subject should be aware at least of the identity of the controller and the purposes of the processing for which the personal data are intended." [Article 13]

There is nothing in the current wp.org privacy policy which confirms identity. There's a lack of clarity about who the WordPress Foundation is, when I review various documents. Does it have a Board? A steering committee? Oversight of any sort?

We learn in the wp.org privacy policy that there are "affiliated organizations that (i) need to know that information in order to process it on WordPress.org’s behalf or to provide services available at WordPress.org" - but we don't learn who these organizations are, at all. I think we might be able to guess, but we certainly are not informed.

I'm not just randomly firing shots here - this could use some straightening out. This wording is specifically problematic. Identify the data controller. Identify the purposes for which the data are intended. We could be super specific about both things - which is something I would get behind 100%. Transparency ftw :)

Having some public diagrams on this may make people comfortable with *additional* telemetry, and therefore make for better informed data gathering opportunities in the future.

If these oversights or ommissions are not best pointed out by a contributor, then who would you prefer says it?

#55 @idea15
6 years ago

The data map is internal to Automattic. It's not for contributors to tell them how to run their business. It is for them to clarify that information in the wp.org privacy policy.

The Foundation is basically pieces of paper for US tax purposes. Don't overread into it.

#56 follow-up: @robscott
6 years ago

@idea15 you need to have a think about this answer.

#57 in reply to: ↑ 56 @idea15
6 years ago

I've been having a think about these things since I started asking A8C legal about them last summer. I've been having a think about the wider implications since I started trying to wake the community up to GDPR in 2016.

You can ask all you want. It doesn't mean you're going to get answers.

So we can tilt at windmills, or we can work on the immediate solutions we need on the code level in 46 days.

Replying to robscott:

@idea15 you need to have a think about this answer.

#58 @robscott
6 years ago

If what you're saying is accurate, the immediate solutions we need at code level are to turn off this telemetry.

#59 @idea15
6 years ago

All of the questions you are asking have been asked before. All of the answers you are asking for have been requested before. I think you greatly overestimate the interest in these issues from anyone outside the project group. Our priorities are not anyone else's priorities. You can explain GDPR, the European privacy framework, and the legal/political implications until you go blue in the face. You have no idea how hard I've worked for two years to make it otherwise.

The challenge here is to find a constructive solution despite that.

#60 @DavidAnderson
6 years ago

So, if I follow the last few contributions correctly, it seems that people are saying that...

1) nobody is quite sure who, in a legal sense (when it comes down to obligations to obey specific and particular laws), wordpress.org/The WP Foundation ultimately is (Who is their legal officer? How can you ask them questions and get official answers that can be shared publicly and acted upon?), and

2) neither does anybody really know how "they" intend to fulfil their GDPR legal obligations (or not, if they prefer) regarding data that WP self-install sends to their servers; and

3) the suggestion is that perhaps people could send in patches so that other people with commit access can review them - but their decisions to commit them, or not , may or may not agree with anybody "higher up"'s policies, supposing that such policies exist, or that anybody knows what they are, or may just be their own private views, or may be the result of throwing dice - or something else? - and these patches should all be in order to minimise the legal risks of we-are-not-really-sure who?

Well, life is certainly not dull today, anyway.

#61 follow-up: @idea15
6 years ago

All of the information on who the WP Foundation is (officers, staff, and legal registration) is easily found from the usual due diligence sites. Questions 2 and 3 have been asked (believe me). Only they can answer them.

That issue shouldn't preclude any other further work - whether that's genuine work or workarounds - to the best of our abilities.

#62 in reply to: ↑ 61 @robscott
6 years ago

Put me down as a +1 on not sending any data unless admin approves of its being sent.

Add one on for telling admin where data is going - specifically - and what the data will be used for, specifically.

Whether its personal data or not - that's a problem for the ultimate data controller to square away with their legal team. Whether that entity be the WordPress Foundation or another corporate entity relies upon access to information I do not have.

We can continue hooking and blocking the phone home in business contexts where this is regarded as desirable. I feel that to expect this of an "ordinary person" or "man on the Clapham Omnibus" may be perhaps a step too far.

This ticket was mentioned in Slack in #core by xkon. View the logs.


6 years ago

This ticket was mentioned in Slack in #gdpr-compliance by xkon. View the logs.


6 years ago

#65 @iandunn
6 years ago

#44022 was marked as a duplicate.

#66 follow-up: @alicewondermiscreations
6 years ago

#44022 was a damn request for what should be a plugin and could easily be a plugin to be turned into a plugin.

It was not a GDPR issue.

#67 @alicewondermiscreations
6 years ago

Yes part of #44022 was wondering why they need the blog website to send a list of WordPress events local to the webmaster, but what that aside to the issue that the function provided has nothing to do with the functional operation of core, so it should be a plugin that can be turned off by those who don't want their location sent, whether or not it is being tracked.

#68 in reply to: ↑ 66 @iandunn
6 years ago

Replying to alicewondermiscreations:

#44022 was a damn request for what should be a plugin and could easily be a plugin to be turned into a plugin.

My bad, I got the impression that your main concern was privacy, but I'm happy to continue the discussion on the original ticket, if you feel like privacy is a secondary issue.

You're always welcome to continue discussion on a closed ticket if you feel like there's more to talk about, or if you disagree with the reason it was closed. Tickets are closed mostly to help keep the list of open issues manageable, rather than to shut down conversations.

To avoid sidetracking the conversation here, I'll follow up on #44022 with my thoughts on whether or not the Events Widget is more appropriate for Core or a plugin.

#69 @alicewondermiscreations
6 years ago

My issue is that divulging of information should be consensual.

  • Consent is always informed
  • Consent is always voluntary
  • Consent is always revocable

That definition is actually from bdsm circles but it applies everywhere.

Sending location is fine, even with the blog domain, as long as the person both known (the informed part), is okay with it being sent (the voluntary part), and can changes their mind - meaning there is an easy way for someone to opt out.

That's easy to do if it is a plugin because a plugin has a description that can be used to inform the blog admin and the blog admin can easily disable it at will.

Since the location of WordPress events has absolutely nothing to do with the operation of the blog itself, there really is no sane reason why it isn't a plugin.

Last edited 6 years ago by alicewondermiscreations (previous) (diff)

This ticket was mentioned in Slack in #gdpr-compliance by desrosj. View the logs.


6 years ago

#71 @desrosj
6 years ago

  • Focuses privacy added
  • Keywords gdpr removed

Removing gdpr keyword in favor of the privacy focus.

This ticket was mentioned in Slack in #core-privacy by allendav. View the logs.


5 years ago

Note: See TracTickets for help on using tickets.