WordPress.org

Make WordPress Core

Opened 2 years ago

Last modified 6 days ago

#43738 new enhancement

Make the personal data Export/Delete functionality available in network-wide for super admins

Reported by: TZ Media Owned by:
Milestone: 5.5 Priority: normal
Severity: normal Version:
Component: Privacy Keywords: privacy-roadmap has-patch
Focuses: multisite Cc:

Description

There are approaches where single sites in a network are not independent, but integral parts of one website, interdependent in some way, or should be handled together for other reasons.

Therefore a view for GDPR export and delete similar to the one in a single site would be needed to manage network-wide requests in addition to the site-wide requests that should still work as expected if the site is part of a network. This should gather the data from all sites in a network, and export them in one file.

Common use cases are:

  • Site utilizing network installs using plugins like Multilingual Press for different language versions of one site
  • Network installs where different sites in a multisite behave like one website to the end-user (same design, shared functionality, shared user base)

Attachments (1)

43738.diff (13.1 KB) - added by pbiron 6 days ago.
preliminary! adds just the network-level UI; does not actually handle network-level exports/erases.

Download all attachments as: .zip

Change History (35)

#1 @TZ Media
2 years ago

  • Keywords gdpr added

This ticket was mentioned in Slack in #gdpr-compliance by coreymckrill. View the logs.


2 years ago

#3 @coreymckrill
2 years ago

For some multisite use cases, GDPR compliance might all be handled at the individual site level. However, as mentioned above, there are many use cases where it makes more sense to handle it at the network level. Specifically:

  • Setting a privacy policy, or at least a template, that would be used on every site in the network
  • Personal data export/anonymization/erasure across the entire network

Other multisite-specific questions/considerations:

  • The user requesting the data export/anonymization/erasure is the owner/admin of one of the sites within the network. What happens to the site?
  • Super admins can access the user data on any site in the network. Does this need to be called out as something that should be disclosed in the Privacy Policy?
  • Similar to the question about abandoned user accounts, what should be done (if anything) with abandoned sites?

So, in terms of applying the roadmap to a multisite context, here's some ways we could approach it:

Add tools for creating a privacy policy

There could be a new Network Admin menu item "Tools" with a "Privacy" page, similar to #43435. This could be a place to create a template that would pre-fill the Privacy Policy pages on new sites in the network.

Add tools to core to facilitate compliance, and privacy in general

These could look similar to the single site tools that are in the works, but would work across all sites in the network. Under the hood, that might look something like this:

  • Search across the network for the user's ID and/or email
  • For each site that contains data from that user, run the single site functions for export/anonymization/erasure, and if for export, compile it all into one file (or multiple, if its a huge amount of data?)
  • This would perhaps all happen asynchronously via Ajax/REST requests, since it would potentially involve a lot of big expensive queries

Add documentation/help for site owners on how to use these tools

Network admin documentation probably wouldn't be that different.

This ticket was mentioned in Slack in #core-multisite by coreymckrill. View the logs.


2 years ago

#5 @maxfein
2 years ago

  • Focuses multisite added

Happy to see this being addressed for multisite... super busy atm, will come back with something more structured soon as I can :) ...rest is mostly a quick re-post of a comment at roadmap to bring at least something to the table...

I’m using JJJ’s wp-multi-network (to run multiple networks from one WP).

so, I also have some networks (mostly) self-administered by clients and I have some clients with accounts at multiple networks >>> and they all have many users of various kinds associated with various sites(/networks).

I imagine that institutions like schools and many other use cases face similar issues.

basically, seems like there are times when *things* need to get more nuanced,
eg. data export/delete from (per site|per network|?select from mysites?|all data)

Cheers, Max

This ticket was mentioned in Slack in #core-multisite by maxfein. View the logs.


2 years ago

#7 follow-up: @philippmuenchen
2 years ago

I absolutely agree with the comments above. A Multisite and/or Multi-Network environment needs the possibility to set defaults to all sites in the networks. Also the other open GDPR tickets should keep Multisite in mind.
As the admin is the boss in a single site, the super admin is the boss of all sites. So e.g. the request to delete/forget a user should come to the super admin as well as he usually is also responsible for the network of pages (as operator).

I also commented on the roadmap before: https://make.wordpress.org/core/2018/03/28/roadmap-tools-for-gdpr-compliance/#comment-33565

#8 in reply to: ↑ 7 @coreymckrill
2 years ago

Replying to philippmuenchen:

Also the other open GDPR tickets should keep Multisite in mind.

Here is a list of other tickets that @iandunn and I have identified as potentially having multisite implications:

There will, of course, be more, as new tickets are being added every day.

#9 follow-up: @jeremyfelt
2 years ago

Search across the network for the user's ID and/or email

I'm still catching up on GDPR and will have to go through some of the tickets to figure out what data is being exported and in what format, but this specific task may not be realistic with multisite's current data structure.

We can look at a single table, wp_usermeta, to get a list of sites the user has a role on, but I'm not sure if that covers everything that we need.

For small networks, it may be possible to just loop through each site and retrieve the data. As networks get larger, this becomes less and less realistic and would probably need to have a long running task using cron.

A couple questions:

  • Is there a personal data export API endpoint that is available on each site?
  • What's the best ticket to start on to get up to speed on the data being exported? #43438?

#10 in reply to: ↑ 9 @coreymckrill
2 years ago

Replying to jeremyfelt:

Search across the network for the user's ID and/or email

We can look at a single table, wp_usermeta, to get a list of sites the user has a role on, but I'm not sure if that covers everything that we need.

It doesn't, because comments might only be associated with an email address, not a user ID. This may also be the case for data stored by plugins. But wp_usermeta is probably a good start.

A couple questions:

  • Is there a personal data export API endpoint that is available on each site?

There has been some discussion of REST API endpoints, but I don't think there is any code for this yet (although I haven't caught up on everything yet today). I think these endpoints will be critical for multisite needs.

  • What's the best ticket to start on to get up to speed on the data being exported? #43438?

Yep, plus #43546 for the (single site) UI.

#11 @jeremyfelt
2 years ago

I had a chance to dive into #43546 to get a better grasp on what's included with a default single site level export and am starting to form some ideas around how I think a network level export could work.

From above, exporting (or removing) user data at the network level should/could:

  • Search across the network for the user's ID and/or email
  • For each site that contains data from that user, run the single site functions for export/anonymization/erasure, and if for export, compile it all into one file (or multiple, if its a huge amount of data?)
  • This would perhaps all happen asynchronously via Ajax/REST requests, since it would potentially involve a lot of big expensive queries

I think one way this could happen is:

  1. Network administrator initiates a user export, which schedules a cron event on the main site.
  2. The cron event on the main site loops through all sites on the network and schedules site level events to build user data for export.
  3. A main site cron event checks the status of these regularly through switch_to_blog() or something similar and receives one of 4 responses - (1) Process not started, (2) Process started, not complete, (3) Process complete, no data available, (4) Process complete, data available.
  4. When all processes are complete, the full export can be built on the main site and provided for download.

Very happy to hear other thoughts on that approach and other possibilities! :)

With a process like that, it seems unrealistic that a full network level export can be ready for 4.9.6 and that for now we should rely on manually compiling site level exports.

Note that wp_blogmeta landed for 5.0 in #37923 and may help quite a bit with tracking export status on individual sites.

This ticket was mentioned in Slack in #core-multisite by jeremyfelt. View the logs.


2 years ago

#13 @mnelson4
2 years ago

In Slack I suggested we should maybe use multiple AJAX requests to create the export file or anonymize the user data, a bit like https://core.trac.wordpress.org/ticket/43438.
If we do this, we should probably gather export data from more than one site per ajax request, otherwise it will be unbearably slow for large networks.

@flixos90 suggested we may want to also have a WP-CLI command to do this.

But all three UIs (WP-CLI, ajax requests from an admin page, or background CRON) will need to iterate over all sites, do the single-site logic (ie, add export data or anonymize data) and, in the case of data export, somehow make the file available to the admin

#14 @jeremyfelt
2 years ago

  • Milestone changed from Awaiting Review to 5.0

I'm not sure we can reliably handle this by making AJAX requests to each site from the dashboard due to CORS. An interface via WP-CLI would be nice for efficiency at a later point, but we'll need a direct solution in core as well.

A side-effect of using switch_to_blog() to schedule cron events on many sites is that any plugins hooking into WP's cron system will not be available. This could cause unknown issues with how those events are (or aren't) run.

For 4.9.6 we could store a network option containing a list of sites and users awaiting data. Each individual site could regularly check this list to start its own data process. For 5.0.0 we would have wp_blogmeta available and could place something there instead.

An alternative could be a REST endpoint on each site that returns a user's data and is called from a cron event via "remote" request on the network's main site.

In any case, I don't see a straight-forward way of doing this on demand. Anything will require a build period in which data is collected across the network before it is ready.


I'm moving this to 5.0 as a suggestion. I don't see it being ready for 4.9.6 as we haven't gone through a collection of data across dozens/hundreds/thousands of sites before and it needs time to soak and be tested thoroughly.

Separately—I'm not sure if it belongs on this ticket—we should be able to plan on network level data itself being exported as part of the main site.

This ticket was mentioned in Slack in #meta-tracdev by iandunn. View the logs.


2 years ago

#16 @desrosj
2 years ago

  • Keywords needs-patch added
  • Milestone changed from 5.0 to Awaiting Review

#17 @desrosj
2 years ago

  • Milestone changed from Awaiting Review to 5.0

#18 @mnelson4
2 years ago

I'm not sure we can reliably handle this by making AJAX requests to each site from the dashboard due to CORS.

Good point to bring up, but couldn't that be overcome by sending CORS Access-Control-Allow-Origin: * headers on these AJAX requests?

A side-effect of using switch_to_blog() to schedule cron events on many sites is that any plugins hooking into WP's cron system will not be available. This could cause unknown issues with how those events are (or aren't) run.

Oups, ya. That kinda kills that the idea of handling multiple sites in a single request. (Unless that's a shortcoming we're willing to overlook.) So at most the AJAX/REST requests will be able to do one site at a time.

An alternative could be a REST endpoint on each site that returns a user's data and is called from a cron event via "remote" request on the network's main site.

I think that's viable, too.

Last edited 2 years ago by mnelson4 (previous) (diff)

This ticket was mentioned in Slack in #core-multisite by jeremyfelt. View the logs.


2 years ago

#20 @desrosj
23 months ago

  • Component changed from General to Privacy

Moving to the new Privacy component.

This ticket was mentioned in Slack in #gdpr-compliance by desrosj. View the logs.


23 months ago

#22 @desrosj
22 months ago

  • Keywords privacy-roadmap added

#23 @desrosj
22 months ago

  • Summary changed from Make the GDPR Export/Delete functionality available in network-wide for super admins to Make the personal data Export/Delete functionality available in network-wide for super admins

#24 @desrosj
21 months ago

  • Keywords gdpr removed

Removing the GDPR keyword. This has been replaced by the new Privacy component and privacy focuses in Trac.

#25 @pento
18 months ago

  • Milestone changed from 5.0 to 5.1

#26 @desrosj
15 months ago

  • Keywords changed from needs-patch, privacy-roadmap to needs-patch privacy-roadmap
  • Milestone changed from 5.1 to Future Release

This still needs more thought and lacks an initial patch.

#27 @garrett-eclipse
6 weeks ago

  • Milestone changed from Future Release to 5.5

This ticket was mentioned in Slack in #core-privacy by pbiron. View the logs.


2 weeks ago

#29 @pbiron
2 weeks ago

I'm working on a patch for this. The first iteration will ONLY cover the UI/UX of managing the requests (and not the actually doing the exports/erasures). I've got a couple of questions (about whether certain things should/need to be handled, especially in an initial patch):

  1. As of 5.4, wp_create_user_request() checks whether there are incomplete requests for a given email address and displays an admin notice if there are (rather than creating a new request). Having the ability to initate requests at the network level adds additional wringles to that check.
    • if there are incomplete requests for a given email address on one or more sub-sites in the network, should an admin notice be displayed when a network-level request is initated?
    • conversely, if there is an incomplete request for a given email address at the network-level what should happen when a site-level request is initated for that email address?
    • should the site-level request(s) be "subsumed" by the network-level request?
    • what about multi-networks (hint: the users table is shared across all networks, so the answer here is probably the same as for the single network case, altho since core doesn't provide a UI for multi-networks, we might leave that in plugin territory)

Ultimately, I think we should let site owners make those decisions (by providing appropriate hooks/constants), but we still need to set up sensible defaults.

  1. Also, what if a request is inadvertently submitted at the wrong level (i.e., at the site-level when it should have been submitted at the network-level, and vice versa). Do we want to support moving a request from the site-level to network-level, and vice versa.
    • similarly, in a multi-network setup, do we want to support moving a request from one network to another (whether at the network- or site-level)

This ticket was mentioned in Slack in #core-privacy by pbiron. View the logs.


7 days ago

#31 @garrett-eclipse
7 days ago

Summary of my response on #core-privacy;

  1. The requests should all be isolated to their context (network/site) and should only interact with the data they have access to. These should conflict or interact in any way with the other sites or context. So a user may have multiple requests open across the network. If network requests erase data before a site request can produce an export then the export will only contain the remaining information available.
  2. We shouldn't support migration of requests. If an admin made an error they can cancel the request and re-create in the right context. This ensures the user consent is specific to the site/context the request will be conducted in.

@pbiron
6 days ago

preliminary! adds just the network-level UI; does not actually handle network-level exports/erases.

#32 @pbiron
6 days ago

  • Keywords has-patch added; needs-patch removed

43738.diff is VERY preliminary!!!

It does NOT actually do network-level exports/erasures. It, does however, do 2 things:

  1. adds the network-level UI for exports/erasures, by:
    1. adding a Network Admin > Tools menu
    2. adding Export Personal Data and Erase Personal Data menu items to that menu
    3. adding a Network Admin > Tools admin bar node
  2. sets up the request posts and WP_User_Requests objects so that they can distinguish between requests submitted at the network- or site-level, by:
    1. modifying wp_create_user_request() to add a _wp_user_request_blog_id post meta to the request post. When the request is submitted at the network-level, the meta value will be 0
    2. adds _wp_user_request_blog_id property to WP_User_Request so that the object returned by wp_get_user_request() "knows" what level it was submitted to
    3. modifies WP_Privacy_Requests_Table::prepare_items() to query for the post meta, so that list table on the Network Admin > Tools > Export Personal Data screen will only show requests submitted at the network-level, and on Tools > Export Personal Data will only display requests submitted at the site-level

A few more points:

  1. I've tested this on:
    1. a "regular" multisite (i.e., with a single network)
    2. a multi-network (i.e., built with the WP Multi Network plugin)
    3. a single site
    4. all of the above with existing requests submitted before the patch was applied
    5. all of the above in the back-end and with GDPR Data Request Form on the front-end
  2. I'm sure there are ways other than post meta to distinguish requests initiated at the network- vs site-level, but that seemed like a natural choice to me. Other suggestions more than welcome
  3. There may be some place other a Network Admin > Tools menu for the export/erase UI to live, but again, that seemed a natural choice to me. Plus, having that menu would open nice possibilities for multisite-aware plugins.
  4. I'm not sure what cap should be checked to display the Network Admin > Tools menu (the Export/Erase Personal Data menu items use the same cap they use for the regular Tools menu items). The patch uses edit_posts just like the regular Tools menu. Suggestions welcome.
  5. I'm not 100% sure about the new $blog_id param to wp_create_user_request(). Initially, I just had a is_network_admin() check (like in WP_Privacy_Requests_Table::prepare_items()). However, I then realized that such a test would not work for plugins like GDPR Data Request Form that allow requests to be submitted on the front-end. Again, suggestions welcome for other ways to handle that.

To test:

  1. ensure there are existing incomplete requests at the site-level before applying the patch
  2. verify that the existing site-level requests do <strong>NOT</strong> display at the network-level
  3. submit various requests at both the site- and network-levels and ensure that the right thing happens
    1. the site-level requests only display at the site-level
    2. the network-level requests only display at the network-level
    3. you get (or don't get) the appropriate admin notices depending on whether incomplete requests exist (or don't exist) at the appropriate level
  4. submit requests on the front-end (with a plugin such as GDPR Data Request Form) and verify that the request displays on the appropriate site-level screen on the back-end (and does <strong>NOT</strong> display at the network-level on the back-end

This ticket was mentioned in Slack in #core-privacy by pbiron. View the logs.


6 days ago

#34 @pbiron
6 days ago

I figure that once we get the UI part of network-level requests nailed down, then we can turn to working out the details of actually doing network-level exports/erasures.

Note: See TracTickets for help on using tickets.