WordPress.org

Make WordPress Core

Opened 2 months ago

Last modified 6 days ago

#43738 new enhancement

Make the GDPR Export/Delete functionality available in network-wide for super admins

Reported by: TZ Media Owned by:
Milestone: 5.0 Priority: normal
Severity: normal Version:
Component: Privacy Keywords: gdpr needs-patch privacy-roadmap
Focuses: multisite Cc:

Description

There are approaches where single sites in a network are not independent, but integral parts of one website, interdependent in some way, or should be handled together for other reasons.

Therefore a view for GDPR export and delete similar to the one in a single site would be needed to manage network-wide requests in addition to the site-wide requests that should still work as expected if the site is part of a network. This should gather the data from all sites in a network, and export them in one file.

Common use cases are:

  • Site utilizing network installs using plugins like Multilingual Press for different language versions of one site
  • Network installs where different sites in a multisite behave like one website to the end-user (same design, shared functionality, shared user base)

Change History (22)

#1 @TZ Media
2 months ago

  • Keywords gdpr added

This ticket was mentioned in Slack in #gdpr-compliance by coreymckrill. View the logs.


2 months ago

#3 @coreymckrill
2 months ago

For some multisite use cases, GDPR compliance might all be handled at the individual site level. However, as mentioned above, there are many use cases where it makes more sense to handle it at the network level. Specifically:

  • Setting a privacy policy, or at least a template, that would be used on every site in the network
  • Personal data export/anonymization/erasure across the entire network

Other multisite-specific questions/considerations:

  • The user requesting the data export/anonymization/erasure is the owner/admin of one of the sites within the network. What happens to the site?
  • Super admins can access the user data on any site in the network. Does this need to be called out as something that should be disclosed in the Privacy Policy?
  • Similar to the question about abandoned user accounts, what should be done (if anything) with abandoned sites?

So, in terms of applying the roadmap to a multisite context, here's some ways we could approach it:

Add tools for creating a privacy policy

There could be a new Network Admin menu item "Tools" with a "Privacy" page, similar to #43435. This could be a place to create a template that would pre-fill the Privacy Policy pages on new sites in the network.

Add tools to core to facilitate compliance, and privacy in general

These could look similar to the single site tools that are in the works, but would work across all sites in the network. Under the hood, that might look something like this:

  • Search across the network for the user's ID and/or email
  • For each site that contains data from that user, run the single site functions for export/anonymization/erasure, and if for export, compile it all into one file (or multiple, if its a huge amount of data?)
  • This would perhaps all happen asynchronously via Ajax/REST requests, since it would potentially involve a lot of big expensive queries

Add documentation/help for site owners on how to use these tools

Network admin documentation probably wouldn't be that different.

This ticket was mentioned in Slack in #core-multisite by coreymckrill. View the logs.


2 months ago

#5 @maxfein
2 months ago

  • Focuses multisite added

Happy to see this being addressed for multisite... super busy atm, will come back with something more structured soon as I can :) ...rest is mostly a quick re-post of a comment at roadmap to bring at least something to the table...

I’m using JJJ’s wp-multi-network (to run multiple networks from one WP).

so, I also have some networks (mostly) self-administered by clients and I have some clients with accounts at multiple networks >>> and they all have many users of various kinds associated with various sites(/networks).

I imagine that institutions like schools and many other use cases face similar issues.

basically, seems like there are times when *things* need to get more nuanced, eg. data export/delete from (per site|per network|?select from mysites?|all data)

Cheers, Max

This ticket was mentioned in Slack in #core-multisite by maxfein. View the logs.


2 months ago

#7 follow-up: @philippmuenchen
2 months ago

I absolutely agree with the comments above. A Multisite and/or Multi-Network environment needs the possibility to set defaults to all sites in the networks. Also the other open GDPR tickets should keep Multisite in mind. As the admin is the boss in a single site, the super admin is the boss of all sites. So e.g. the request to delete/forget a user should come to the super admin as well as he usually is also responsible for the network of pages (as operator).

I also commented on the roadmap before: https://make.wordpress.org/core/2018/03/28/roadmap-tools-for-gdpr-compliance/#comment-33565

#8 in reply to: ↑ 7 @coreymckrill
2 months ago

Replying to philippmuenchen:

Also the other open GDPR tickets should keep Multisite in mind.

Here is a list of other tickets that @iandunn and I have identified as potentially having multisite implications:

There will, of course, be more, as new tickets are being added every day.

#9 follow-up: @jeremyfelt
2 months ago

Search across the network for the user's ID and/or email

I'm still catching up on GDPR and will have to go through some of the tickets to figure out what data is being exported and in what format, but this specific task may not be realistic with multisite's current data structure.

We can look at a single table, wp_usermeta, to get a list of sites the user has a role on, but I'm not sure if that covers everything that we need.

For small networks, it may be possible to just loop through each site and retrieve the data. As networks get larger, this becomes less and less realistic and would probably need to have a long running task using cron.

A couple questions:

  • Is there a personal data export API endpoint that is available on each site?
  • What's the best ticket to start on to get up to speed on the data being exported? #43438?

#10 in reply to: ↑ 9 @coreymckrill
2 months ago

Replying to jeremyfelt:

Search across the network for the user's ID and/or email

We can look at a single table, wp_usermeta, to get a list of sites the user has a role on, but I'm not sure if that covers everything that we need.

It doesn't, because comments might only be associated with an email address, not a user ID. This may also be the case for data stored by plugins. But wp_usermeta is probably a good start.

A couple questions:

  • Is there a personal data export API endpoint that is available on each site?

There has been some discussion of REST API endpoints, but I don't think there is any code for this yet (although I haven't caught up on everything yet today). I think these endpoints will be critical for multisite needs.

  • What's the best ticket to start on to get up to speed on the data being exported? #43438?

Yep, plus #43546 for the (single site) UI.

#11 @jeremyfelt
2 months ago

I had a chance to dive into #43546 to get a better grasp on what's included with a default single site level export and am starting to form some ideas around how I think a network level export could work.

From above, exporting (or removing) user data at the network level should/could:

  • Search across the network for the user's ID and/or email
  • For each site that contains data from that user, run the single site functions for export/anonymization/erasure, and if for export, compile it all into one file (or multiple, if its a huge amount of data?)
  • This would perhaps all happen asynchronously via Ajax/REST requests, since it would potentially involve a lot of big expensive queries

I think one way this could happen is:

  1. Network administrator initiates a user export, which schedules a cron event on the main site.
  2. The cron event on the main site loops through all sites on the network and schedules site level events to build user data for export.
  3. A main site cron event checks the status of these regularly through switch_to_blog() or something similar and receives one of 4 responses - (1) Process not started, (2) Process started, not complete, (3) Process complete, no data available, (4) Process complete, data available.
  4. When all processes are complete, the full export can be built on the main site and provided for download.

Very happy to hear other thoughts on that approach and other possibilities! :)

With a process like that, it seems unrealistic that a full network level export can be ready for 4.9.6 and that for now we should rely on manually compiling site level exports.

Note that wp_blogmeta landed for 5.0 in #37923 and may help quite a bit with tracking export status on individual sites.

This ticket was mentioned in Slack in #core-multisite by jeremyfelt. View the logs.


8 weeks ago

#13 @mnelson4
8 weeks ago

In Slack I suggested we should maybe use multiple AJAX requests to create the export file or anonymize the user data, a bit like https://core.trac.wordpress.org/ticket/43438. If we do this, we should probably gather export data from more than one site per ajax request, otherwise it will be unbearably slow for large networks.

@flixos90 suggested we may want to also have a WP-CLI command to do this.

But all three UIs (WP-CLI, ajax requests from an admin page, or background CRON) will need to iterate over all sites, do the single-site logic (ie, add export data or anonymize data) and, in the case of data export, somehow make the file available to the admin

#14 @jeremyfelt
8 weeks ago

  • Milestone changed from Awaiting Review to 5.0

I'm not sure we can reliably handle this by making AJAX requests to each site from the dashboard due to CORS. An interface via WP-CLI would be nice for efficiency at a later point, but we'll need a direct solution in core as well.

A side-effect of using switch_to_blog() to schedule cron events on many sites is that any plugins hooking into WP's cron system will not be available. This could cause unknown issues with how those events are (or aren't) run.

For 4.9.6 we could store a network option containing a list of sites and users awaiting data. Each individual site could regularly check this list to start its own data process. For 5.0.0 we would have wp_blogmeta available and could place something there instead.

An alternative could be a REST endpoint on each site that returns a user's data and is called from a cron event via "remote" request on the network's main site.

In any case, I don't see a straight-forward way of doing this on demand. Anything will require a build period in which data is collected across the network before it is ready.


I'm moving this to 5.0 as a suggestion. I don't see it being ready for 4.9.6 as we haven't gone through a collection of data across dozens/hundreds/thousands of sites before and it needs time to soak and be tested thoroughly.

Separately—I'm not sure if it belongs on this ticket—we should be able to plan on network level data itself being exported as part of the main site.

This ticket was mentioned in Slack in #meta-tracdev by iandunn. View the logs.


8 weeks ago

#16 @desrosj
8 weeks ago

  • Keywords needs-patch added
  • Milestone changed from 5.0 to Awaiting Review

#17 @desrosj
8 weeks ago

  • Milestone changed from Awaiting Review to 5.0

#18 @mnelson4
7 weeks ago

I'm not sure we can reliably handle this by making AJAX requests to each site from the dashboard due to CORS.

Good point to bring up, but couldn't that be overcome by sending CORS Access-Control-Allow-Origin: * headers on these AJAX requests?

A side-effect of using switch_to_blog() to schedule cron events on many sites is that any plugins hooking into WP's cron system will not be available. This could cause unknown issues with how those events are (or aren't) run.

Oups, ya. That kinda kills that the idea of handling multiple sites in a single request. (Unless that's a shortcoming we're willing to overlook.) So at most the AJAX/REST requests will be able to do one site at a time.

An alternative could be a REST endpoint on each site that returns a user's data and is called from a cron event via "remote" request on the network's main site.

I think that's viable, too.

Last edited 7 weeks ago by mnelson4 (previous) (diff)

This ticket was mentioned in Slack in #core-multisite by jeremyfelt. View the logs.


7 weeks ago

#20 @desrosj
5 weeks ago

  • Component changed from General to Privacy

Moving to the new Privacy component.

This ticket was mentioned in Slack in #gdpr-compliance by desrosj. View the logs.


3 weeks ago

#22 @desrosj
6 days ago

  • Keywords privacy-roadmap added
Note: See TracTickets for help on using tickets.