#43546 closed enhancement (fixed)
Add to the privacy tools UX a means to export personal data by username or email address
Reported by: | allendav | Owned by: | allendav |
---|---|---|---|
Milestone: | 4.9.6 | Priority: | low |
Severity: | normal | Version: | 5.1 |
Component: | Privacy | Keywords: | gdpr |
Focuses: | Cc: |
Attachments (27)
Change History (116)
#3
@
6 years ago
Here’s the flow i’m imaginging:
- user contacts admin to get an export of their data (can be by email, phone call, postal mail, etc)
- admin enters the user’s email address in the box near the request table and hits request-verification button
- mike jolley’s magic code sends the verification email to the user
- user clicks the link in the email thus verifying the request
- admin sees a badge/dot/number on the wp-admin sidebar menu and goes to the export requests wp list table and sees the user verified the request
- admin clicks the download action or the send-export-file to user via email action, whichever they prefer to use
- a progress indicator is displayed for that row of the wp list table while export is in progress
- if the admin downloaded it, they email it manually to the user
#4
@
6 years ago
Here is a write up that could be added to https://developer.wordpress.org/plugins/ and linked to from the under-construction privacy chapter in the guidelines.
It helps explain how all this works together: https://github.com/allendav/wp-privacy-requests/blob/master/EXPORT.md
cc @azaozz @mikejolley
#5
@
6 years ago
Worked with @allendav, @mikejolley, and @karmatosed on the export screen. Data Export screen.png shows what the Personal Data Export screen should look like, while data button flow.png and data flow.png go into specific flows around the downloading and emailing of data exports.
Let me know if y'all have any questions.
#6
@
6 years ago
@melchoyce thank you, thank you, thank you. @mikejolley and I will get this rolled into our little side repo and then return here with a patch pronto
This ticket was mentioned in Slack in #gdpr-compliance by allendav. View the logs.
6 years ago
This ticket was mentioned in Slack in #gdpr-compliance by allendav. View the logs.
6 years ago
#9
@
6 years ago
I just uploaded a rough work in progress that connects the "Download Data" action for a personal data export request all the way through to generating a ZIP file. The ZIP file includes an HTML report that surfaces personal data found in comments.
To test,
- apply this patch
- create a few comments with a given email address
- go to Tools > Personal Data Export and create a new export request for that email address
- confirm the request if desired (not required)
- click on Download Data under the email address in the export table
- you should receive a ZIP file, open it
- behold the personal data found in comments
Besides generally cleaning it up, I also want to change the file generation to use the request CPT instead of options, but otherwise things won't change much. Feedback welcome.
#10
@
6 years ago
Great work, @allendav!
I have a few thoughts regarding the possibility of adding additional output generators (as plugin), as we will need other (custom) output formats for some projects (e.g. XML output or a single PDF document instead of HTML.
From how I understand your code, we would need to be able to
- Add additional action links to the items in the table, like "Download Personal Data as XML".
- Be able to use the HTML generation without saving it to a file directly (to further process it and save it in a different file format like PDF).
- Be able to easily add other output formats, by
- either making as much of the functionality - like generation of $export_data or the ajax loop from xfn.js - reusable as separate functions,
- or make the export generator aware of different output formats, like
- register possible export format generators - including the one form core that generates HTML/ZIP - via a filter hook,
- automate the generation of action links for each format to the items in the table,
- send the desired output format as handler in the AJAX request loop that triggers the report generation,
- and call the export generator that is registered for this handler to generate an output file (either a single file, or an archive)
- Question here: Could we refactor the zip file generation into a helper function that is reusable for all export generators?
I know that this would require deep changes, but would it be possible to include this possibility one way or the other? I'd prefer the last approach as it is the most flexible one for plugin authors, but I could live with other approaches as well.
Also I'd be willing to help implementing this if we can agree on an approach.
EDIT: Another idea would be the possibility to add additional files to the zip file, so the user could get the HTML, and I could add the same data as XML, or in other formats, as well.
@
6 years ago
Minor update to accomodate changes in admin-filters.php that were borking patch application
#12
@
6 years ago
- Milestone changed from Awaiting Review to 4.9.6
Moving to the 4.9.6 milestone after consensus was reached in the most recent GDPR chat (https://wordpress.slack.com/archives/C9695RJBW/p1524063200000304).
This ticket was mentioned in Slack in #gdpr-compliance by casiepa. View the logs.
6 years ago
#14
@
6 years ago
Great work on this so far. :thumbsup: I'm just starting to catch up on the suite of GDPR tickets and have a lot to learn. :) I played around with 43546.5.diff today in an attempt to start thinking through multisite implications.
I'm worried at first glance with the ZIP file generation. Building the filename with a hashed email and timestamp would make it relatively easy for somebody who knows another user's email address to enumerate through thousands of filename possibilities in a short amount of time and potentially access other users' data exports.
We don't really have a good system for this in WP. Multisite had ms-files.php
as a way to route filename URLs to static files on the server, but that has been deprecated for a while.
I think that for exported personal data, authentication should be required before the export file can be downloaded. Has there been any discussion around storing the data in a custom post type for a more dynamic retrieval on demand?
@
6 years ago
Uses postmeta for temporary storage (instead of options), vastly improved error logging, and ZIP/HTML action is now hookable/replaceable
#15
@
6 years ago
@azaozz @TZ Media @jeremyfelt - take a look at the latest patch - it makes HTML/ZIP hookable/replaceable and also makes it much harder to guess the filename - good catch there @jeremyfelt - I noticed that today as well
This ticket was mentioned in Slack in #core by jeffpaul. View the logs.
6 years ago
@
6 years ago
Adds direct-to-email export and marking of export requests as complete. Fixes divs lacking closing tags.
This ticket was mentioned in Slack in #gdpr-compliance by clorith. View the logs.
6 years ago
#18
follow-up:
↓ 22
@
6 years ago
- Keywords needs-testing added; needs-patch removed
I hit this pretty hard, and everything seems to work great! I was not really able to break it. But have some thoughts on a few things.
It took me quite a while to find the "Resend Email" in the bulk actions, and it was not clear to me what that did. I expected it to resend the link to the confirmed request, but instead, it restarts the process completely. Any link that was previously emailed to a user did still work, though. Can we make this clearer? I think it also makes sense to add this to the actions row for each post. If "Download Personal Data" is changed to just "Download", there could be a few more links added there, such as "Resend File", "Resend Confirmation" or "Restart Process".
- If you submit the same email more than once before a user has confirmed the original request, additional requests are added to the table, but only the most recent one is able to be confirmed. All others will show "Invalid Key". Can we limit to one unconfirmed request per email?
- When logged in as an admin, I can click "Download Personal Data" before the user has confirmed the request. Is this ok?
- For inline documentation for an array parameter does not match the coding standards (https://make.wordpress.org/core/handbook/best-practices/inline-documentation-standards/php/#1-1-parameters-that-are-arrays), but there is no documentation for an array of arrays. Not sure what the correct format here, but I think
@type
needs to be added, at least. $report_path
inwp_privacy_process_personal_data_export_page()
is never used and$request_path
after it is not set in the function. Is this a typo? Or did I miss where this was coming from?
Code points addressed in 43546.8.diff:
- Added inline documentation for
wp_ajax_wp_privacy_export_personal_data()
. - Inline comments should end with periods.
- Moved the check for a friendly name after the
! is_array()
check and used the friendly name and match the following errors instead of "Exporter at index X". - Took a shot at improving the documentation for
wp_privacy_generate_personal_data_export_group_html()
(see above about not knowing the exact format for nested array parameters).
This ticket was mentioned in Slack in #gdpr-compliance by desrosj. View the logs.
6 years ago
#20
follow-up:
↓ 21
@
6 years ago
Did some quick tests on this and it seems battle proof. Did not go in depth though, as @desrosj already went deep on this.
Also great work making the output generation hookable via wp_privacy_personal_data_export_file
, @allendav :-)
- How do I make an alternative file downloadable/mailable? Simply by doing something like this?
update_post_meta( $request_id, '_export_file_url', $archive_url ); update_post_meta( $request_id, '_export_file_path', $archive_pathname );
- Could we make the zip file generation in
wp_privacy_generate_personal_data_export_file()
hookable in order for plugins to add additional files to that standard zip archive?
I did a code review myself, and everything looks quite solid.
Great work here!
#21
in reply to:
↑ 20
@
6 years ago
Replying to TZ Media:
Did some quick tests on this and it seems battle proof. Did not go in depth though, as @desrosj already went deep on this.
Also great work making the output generation hookable via
wp_privacy_personal_data_export_file
, @allendav :-)
- How do I make an alternative file downloadable/mailable? Simply by doing something like this?
update_post_meta( $request_id, '_export_file_url', $archive_url ); update_post_meta( $request_id, '_export_file_path', $archive_pathname );
- Could we make the zip file generation in
wp_privacy_generate_personal_data_export_file()
hookable in order for plugins to add additional files to that standard zip archive?I did a code review myself, and everything looks quite solid.
Great work here!
Thank you!
Yep - that's the idea, and the ZIP file generation will be hookable shortly - I'm going to add attachments there (in another ticket - this one is getting too big) and will add a hook there too to add more files.
#22
in reply to:
↑ 18
@
6 years ago
Replying to desrosj:
I hit this pretty hard, and everything seems to work great! I was not really able to break it. But have some thoughts on a few things.
It took me quite a while to find the "Resend Email" in the bulk actions, and it was not clear to me what that did. I expected it to resend the link to the confirmed request, but instead, it restarts the process completely. Any link that was previously emailed to a user did still work, though. Can we make this clearer? I think it also makes sense to add this to the actions row for each post. If "Download Personal Data" is changed to just "Download", there could be a few more links added there, such as "Resend File", "Resend Confirmation" or "Restart Process".
@melchoyce - what do you think? I'd like to address this in a separate patch if we decide to change this.
- If you submit the same email more than once before a user has confirmed the original request, additional requests are added to the table, but only the most recent one is able to be confirmed. All others will show "Invalid Key". Can we limit to one unconfirmed request per email?
@mikejolley - this sounds like a bug? What do you think? I'd like to address this in a separate patch if we decide to fix this.
- When logged in as an admin, I can click "Download Personal Data" before the user has confirmed the request. Is this ok?
Yes - the idea is to give an admin the flexibility to generate the file at any time
- For inline documentation for an array parameter does not match the coding standards (https://make.wordpress.org/core/handbook/best-practices/inline-documentation-standards/php/#1-1-parameters-that-are-arrays), but there is no documentation for an array of arrays. Not sure what the correct format here, but I think
@type
needs to be added, at least.
Yup - i ran into the same thing!
$report_path
inwp_privacy_process_personal_data_export_page()
is never used and$request_path
after it is not set in the function. Is this a typo? Or did I miss where this was coming from?
Good catch! That should be $export_file in that block - it cleans up old export meta when generating a second one on the same request. Will fix asap and post a patch here
Code points addressed in 43546.8.diff:
- Added inline documentation for
wp_ajax_wp_privacy_export_personal_data()
.- Inline comments should end with periods.
- Moved the check for a friendly name after the
! is_array()
check and used the friendly name and match the following errors instead of "Exporter at index X".- Took a shot at improving the documentation for
wp_privacy_generate_personal_data_export_group_html()
(see above about not knowing the exact format for nested array parameters).
Thank you thank you!
@
6 years ago
Incorporates desrosj patch; fixes old export cleanup logic; fixes email address meta key for requests
#23
@
6 years ago
rand()
doesn't generate cryptographically secure random numbers; would it be better to do something like $obscura = wp_generate_password( 32, false, false )
?
That relies on random_int(), which is a CSPRNG. You could also use wp_rand()
directly, but wp_generate_password()
seems more convenient in this case, because you can request the desired length, and it adds alpha characters.
I'm thinking ( 32, false, false )
to match the string length of md5()
, and to avoid characters that aren't valid/readable URLs.
This ticket was mentioned in Slack in #gdpr-compliance by desrosj. View the logs.
6 years ago
#25
follow-up:
↓ 30
@
6 years ago
I definitely have a few concerns here, centered mostly around "keeping this personal data accessible in a downloadable format seems risky".
The first, jeremyfelt addressed already:
I think that for exported personal data, authentication should be required before the export file can be downloaded. Has there been any discussion around storing the data in a custom post type for a more dynamic retrieval on demand?
Best case scenario would be required authentication that checks to make sure a user is allowed to have this file. Unfortunately WordPress doesn't have a way to do this right now. For smaller data sets that could be generated on the fly it wouldn't be too bad, but that's not particularly scalable.
Which brings me to iandunn's suggestion:
rand()
doesn't generate cryptographically secure random numbers; would it be better to do something like$obscura = wp_generate_password( 32, false, false )
?
This seems like an easy win. While rand()
technically meets the promise of the code comment // Generate a difficult to guess filename.
we can make it far more difficult to guess by using wp_generate_password()
.
Lastly, what about deleting these files? It looks like there is a cleanup method (wp_privacy_delete_old_export_files()
) that removes old files, but it only runs when a new export is created (and removes the files but not the post meta linking to it?). I like the suggestion in the comment to set that up to run on a cron. It would also be nice to have a manual delete option so an admin could choose to delete the file once it has been downloaded or emailed.
#27
@
6 years ago
Committed 43546.9.diff to be able to test it easier. The above concerns still have to be addressed.
#28
@
6 years ago
@azaozz We've got a failing test: https://travis-ci.org/WordPress/wordpress-develop/builds/372193674
@
6 years ago
Video of a really long export. Shows how a user could miss the fact that the export is being generated
@
6 years ago
Animated GIF of a really long export. Shows how a user could miss the fact that the export is being generated
#29
@
6 years ago
Sorry for the mp4- didn't realize that would be unusable and now I can't remove it. OUPS!
But I hope the GIF makes my suggestion clear: while the file is being generated, its pretty easy to miss the indications that it's being downloaded, and so it might be better to have a spinner or something more obvious.
Also: this is working fine with our plugin 👍
#30
in reply to:
↑ 25
;
follow-ups:
↓ 33
↓ 36
@
6 years ago
43546.11.diff has these changes:
- First pass at a cron job instead of manually calling
wp_privacy_delete_old_export_files()
when generating a new file. Right now it's setup on the admin side, becausewp_schedule_delete_old_privacy_export_files()
is inwp-admin/includes/file.php
, but I think it'll need to be moved to a front-end include file, because otherwise the job is even less likely to run as often as needed on small sites. - Switch from
rand()
towp_generate_password()
. - Fix the failing unit test. sanity check on fix for unit test - seems wiser to compare the urls, not hardcode/duplicate the surrounding link markup in the test itself, but don't have a lot of context, so could be missing something
- Add
wp_privacy_export_expiration
filter for determining the expiration lifetime. The email text previously had it hardcoded, but I switched to pulling from the filter instead, and updated the string to list the date it will expire on, rather than "3 day from now". I tried usinghuman_time_diff()
to preserve the original string, but it's not accurate enough in all cases, because it often rounds to the nearest week/month/etc). I think a date is probably less ambiguous, too, because there may be a delay between when the email is sent and when the user opens it. So it could say "3 days" because it was sent 2 days ago, but in reality the reader only has 1 day left to open it. I also added an explanation of why the file will be deleted, to avoid any confusion. - Adds
@group privacy
to the unit tests. If someone has time, it'd be good to make sure all the GDPR tests have this. - Removes the
@
fromunlink
, since suppressing error messages is general a bad practice. I'm guessing it was originally done to avoid them showing up in an AJAX response? That's no longer an issue since it's called via cron now. - Tell
list_files()
to excludeindex.html
, so it doesn't get deleted. - Miscellaneous cleanup.
Some other feedback
Download Personal Data
andEmail Data
function like buttons, and are visually formatted as buttons, but are setup as links. Would it be more semantic (and maybe accessible?) to use an actual<button>
?- If we're going to add additional files to the
.zip
besides theindex.php
, then it might be nice to prefix a folder for all the files. That way, if someone downloads the file to~/Downloads
, when they unzip the file, they don't get a ton of files extracted to~/Downloads
, where they'd be mixed in with tons of other files, making them hard to find, and creating a mess. Instead, they'd be extracted to~/Downloads/wp-personal-data-file-iandunn-at-example-org-ZG56zBVyy0jynV8W61aVsZ5vKfUmLk1o/
. - Similarly, would the HTML file be better named something like
wp-personal-data-file-iandunn-at-example-org-ZG56zBVyy0jynV8W61aVsZ5vKfUmLk1o.html
instead ofindex.html
, which doesn't describe the contents at all, and may conflict with other export files? - If no data is available, it'd be nice to display a message in
index.html
indicating that; otherwise the user might be confused why they're not seeing anything. Or, maybe short circuit the entire process, and just send them an email saying that there isn't any data, instead of giving them a link to click? - I'm not sure why the email says, "This email has been sent to {address}". Isn't that obvious from the
To
header?
Replying to aaroncampbell:
It would also be nice to have a manual delete option so an admin could choose to delete the file once it has been downloaded or emailed.
It looks like this exists in the bulk edit row, but it'd be nice to have remove
link next to the download
link, too, to make it more discoverable and convenient.
#32
follow-up:
↓ 34
@
6 years ago
@iandunn 43546.11.diff looks good. I only committed the unit test fix to calm down Travis :)
Been wondering how to best handle the personal data export. Having a secret download link to a zipped file would be a pretty good UX but has some drawbacks as noted above (needs cleanup, eventual leaks...).
Another alternative would be for the admin to always download the zipped export and give it to the user in some other way, either email it or share it (dropbox, google drive, etc.). Doing this will require more "handling" from the admin but will remove all of the above security concerns (assuming sharing personal data through email or third party services is acceptable).
Another option we should probably consider is password protecting the zip archive. As far as I see the only way this can be done reliably is to instruct the admin to download the zip and compress it again with a password, then email the password separately to the user.
Of course all of this can be done automatically on the server, but will probably need shell access, so seems it's a plugin material.
#33
in reply to:
↑ 30
@
6 years ago
Replying to iandunn:
- First pass at a cron job instead of manually calling
wp_privacy_delete_old_export_files()
when generating a new file. Right now it's setup on the admin side, becausewp_schedule_delete_old_privacy_export_files()
is inwp-admin/includes/file.php
, but I think it'll need to be moved to a front-end include file, because otherwise the job is even less likely to run as often as needed on small sites.
As far as I know, when a cron runs it is never considered to be an admin request. wp_privacy_delete_old_export_files()
will need to be moved to a different file, with the action hook for the cron as noted in your comments. I think wp_schedule_delete_old_privacy_export_files()
is fine where it is (it should only be scheduled when in the admin). @iandunn what do you think about putting this in wp-includes/functions.php
at the bottom? This has been where a lot of the privacy functions have gone.
- Adds
@group privacy
to the unit tests. If someone has time, it'd be good to make sure all the GDPR tests have this.
I made a note for this to check after beta.
Thanks! Great work on this.
#34
in reply to:
↑ 32
@
6 years ago
Replying to azaozz:
Another alternative would be for the admin to always download the zipped export and give it to the user in some other way, either email it or share it (dropbox, google drive, etc.). Doing this will require more "handling" from the admin but will remove all of the above security concerns
I'm a little leery of shifting the burden from WP to the admin. If the admin shares the file via Dropbox, Drive, etc, it seems like they'd most likely use an "unlisted" URL, which is essentially the same thing that we're doing, except that it won't be automatically deleted after a few days.
If it's an email attachment, then that removes the chance of an unauthorized download, but it won't work for attachments larger than ~10 MB in many cases. It sounded like we may start adding Media Library uploads to the export, so that would be problematic.
Even if we provide a downloadable link, the admin does still always have the option of doing what you suggested, though, since we already provide a way for them to download the data and then remove the file manually. And of course plugins can be written to provide additional alternatives.
Another option we should probably consider is password protecting the zip archive
It seems like adding a password would mainly protect against attempts to brute force the URL, right? We'd have to email the password to the user along with the link to the file, so a compromised email would also compromise the password.
I think we may already have a reasonable amount of protection against brute force, though. For an attacker to succeed, they'll need to:
- Know the email address of the target.
- Know the 3-day time window when the download will be available.
- Brute force a
62^32
search space against a remote network.
For a worst-case scenario, let's assume the attacker has a large botnet, there's no DDoS monitoring, and the attacker is able to achieve #1 and #2 without much trouble (maybe by initiating a request to pique the target's curiosity, so that the target will confirm it, even though they didn't request it).
To calculate the amount of time it'd take to brute force:
- Alphabet size: 26 lowercase alpha + 26 uppercase alpha + 10 digits = 62 characters
- Key length = 32 characters
- Number of possible values = 6232
- Number of guesses per second will vary wildly depending on the target, but erring on the side of paranoia, and taking Moore's law into account, let's say
150,000
. - On average, you only have to search half the space before finding a match
150000 * 62^32 / 2
= 1.704493413 * 10^62
seconds = 5.404913158 * 10^54
years = 5,404,913,150,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000
years
Somebody definitely needs to double check all of those assumptions and calculations, but if I'm right, then it doesn't seem like we need any extra protection against a brute force attack.
If there are some cases that desire password protection, it seems like it'd be possible for a plugin to cobble together something that'd work for a significant number of users, but not all. e.g., ZipArchive
AES encryption in PHP 7.2 plus multiple fallbacks for older versions, like DotNetZip
on Windows, and system( 'zip -P' )
calls on Linux. We could add an action after the file is written to disk, so that a plugin can immediately replace it w/ a encrypted version.
@
6 years ago
Move cron to front end; improve error handling; add file_create action; apply coding standards; stub unit tests
#35
@
6 years ago
Replying to desrosj:
it should only be scheduled when in the admin)
Hmm, why is that? I would think that there could be edge cases where it doesn't get scheduled often, resulting in gaps and files being available longer than the expiration. For example, if a plugin automates the request confirmation, so that the admin doesn't have to manually visit wp-admin
. I haven't thought it through very deeply, though, so I could easily be wrong.
In 43546.12.diff, I moved both of the cron functions to the file you suggested. Even if the scheduler can be in wp-admin, it seems like it might be nice to have them in the same place, since they're so closely related.
43546.12.diff also does these things:
- Improve error handling around the zip creation/writing; made sure the zip is always closed if even errors happen, and that the report file is always removed even if errors happen.
- Added the
wp_privacy_personal_data_export_file_created
that I mentioned towards the end of comment:34, so that plugins can password-protect the file if they want. If we decide that the 32 character key doesn't provide enough protection, though, then it may not be worth keeping this. - Applies the coding standards to the functions that were introduced in this ticket. Although, now that I think about it, I may have mixed a few of them up. I'll have to check later.
- Creates unit test stubs
I ran out of time today, so this needs more testing, but writing unit tests will go a long way towards that. I've got a lot of commitments this weekend, so I don't think I'll have any more time to work on it until Monday, in case anyone wants to take over and write tests, etc.
Speaking of tests, it seems like there's very little coverage for all the functions being added (across all tickets). Ideally every new function should have corresponding tests. Going back and adding those might be a really good thing to do once 4.9.6-beta1
is out.
#36
in reply to:
↑ 30
@
6 years ago
Replying to iandunn:
43546.11.diff has these changes:
- First pass at a cron job instead of manually calling
wp_privacy_delete_old_export_files()
when generating a new file. Right now it's setup on the admin side, becausewp_schedule_delete_old_privacy_export_files()
is inwp-admin/includes/file.php
, but I think it'll need to be moved to a front-end include file, because otherwise the job is even less likely to run as often as needed on small sites.- Switch from
rand()
towp_generate_password()
.- Fix the failing unit test. sanity check on fix for unit test - seems wiser to compare the urls, not hardcode/duplicate the surrounding link markup in the test itself, but don't have a lot of context, so could be missing something
- Add
wp_privacy_export_expiration
filter for determining the expiration lifetime. The email text previously had it hardcoded, but I switched to pulling from the filter instead, and updated the string to list the date it will expire on, rather than "3 day from now". I tried usinghuman_time_diff()
to preserve the original string, but it's not accurate enough in all cases, because it often rounds to the nearest week/month/etc). I think a date is probably less ambiguous, too, because there may be a delay between when the email is sent and when the user opens it. So it could say "3 days" because it was sent 2 days ago, but in reality the reader only has 1 day left to open it. I also added an explanation of why the file will be deleted, to avoid any confusion.- Adds
@group privacy
to the unit tests. If someone has time, it'd be good to make sure all the GDPR tests have this.- Removes the
@
fromunlink
, since suppressing error messages is general a bad practice. I'm guessing it was originally done to avoid them showing up in an AJAX response? That's no longer an issue since it's called via cron now.- Tell
list_files()
to excludeindex.html
, so it doesn't get deleted.- Miscellaneous cleanup.
Some other feedback
Download Personal Data
andEmail Data
function like buttons, and are visually formatted as buttons, but are setup as links. Would it be more semantic (and maybe accessible?) to use an actual<button>
?- If we're going to add additional files to the
.zip
besides theindex.php
, then it might be nice to prefix a folder for all the files. That way, if someone downloads the file to~/Downloads
, when they unzip the file, they don't get a ton of files extracted to~/Downloads
, where they'd be mixed in with tons of other files, making them hard to find, and creating a mess. Instead, they'd be extracted to~/Downloads/wp-personal-data-file-iandunn-at-example-org-ZG56zBVyy0jynV8W61aVsZ5vKfUmLk1o/
.- Similarly, would the HTML file be better named something like
wp-personal-data-file-iandunn-at-example-org-ZG56zBVyy0jynV8W61aVsZ5vKfUmLk1o.html
instead ofindex.html
, which doesn't describe the contents at all, and may conflict with other export files?- If no data is available, it'd be nice to display a message in
index.html
indicating that; otherwise the user might be confused why they're not seeing anything. Or, maybe short circuit the entire process, and just send them an email saying that there isn't any data, instead of giving them a link to click?- I'm not sure why the email says, "This email has been sent to {address}". Isn't that obvious from the
To
header?Replying to aaroncampbell:
It would also be nice to have a manual delete option so an admin could choose to delete the file once it has been downloaded or emailed.
It looks like this exists in the bulk edit row, but it'd be nice to have
remove
link next to thedownload
link, too, to make it more discoverable and convenient.
Thank you @iandunn ! I really appreciate the cron job. The @ on the unlink was copied from elsewhere in core, I assumed it was a best practice to avoid an error unnecessarily disrupting things.
#37
@
6 years ago
I looked through core and found that more cron events are registered in the admin than the front end, but also not sure why that is.
wp_version_check
, wp_update_plugins
, and wp_update_themes
are scheduled on the init
hook, but update_network_counts
is scheduled on admin_init
, wp_scheduled_delete
and delete_expired_transients
are scheduled in wp-admin/admin.php
, and wp_scheduled_auto_draft_delete
is scheduled in wp-admin/post-new.php
.
I think that this is missing an additional status for requests that have passed the expiration time and have had their file deleted. When the file is deleted, I think that the request post should be marked as "expired" (or something to that effect). As it stands right now (I may have missed it), I don't think the request posts themselves would ever be cleaned up like the export files. I think this is ok as long as the user is aware that the request has expired. The admin can still click the download link and a new file will be generated, but from the standpoint of the user requesting their data, the request has expired.
Should an admin have to reinitiate/confirm an export request after the expiration time period has passed? Currently, they can still download a data export after the expiration time has passed.
Also, should the user receive an email when their data download expires informing them that they need to re-request a data export? Do they see a notice when they click on an expired link? My hunch is that they will just see the front page of the site, like my testing showed in #43905.
@iandunn if you see any tickets not accompanied by the proper amount of unit tests, please just add the needs-unit-tests
keyword with a quick comment so we can circle back during the beta period.
This ticket was mentioned in Slack in #gdpr-compliance by desrosj. View the logs.
6 years ago
#42
@
6 years ago
- Keywords fixed-major commit needs-unit-tests added; needs-testing removed
I think the only thing left to commit here is the coding standard violations. I'll do those later tonight, but they don't need to be backported.
These functions need unit tests:
wp_privacy_delete_old_export_files()
wp_privacy_generate_personal_data_export_file()
wp_privacy_send_personal_data_export_email()
wp_privacy_process_personal_data_export_page()
This ticket was mentioned in Slack in #gdpr-compliance by iandunn. View the logs.
6 years ago
#52
@
6 years ago
Backports are done, comment:42 still needs to be addressed.
This ticket was mentioned in Slack in #gdpr-compliance by desrosj. View the logs.
6 years ago
This ticket was mentioned in Slack in #core by desrosj. View the logs.
6 years ago
#56
follow-up:
↓ 57
@
6 years ago
Just curious why wp_privace_delete_old_export_files
uses unlink()
instead of wp_delete_file()
?
#57
in reply to:
↑ 56
@
6 years ago
Replying to macbookandrew:
Just curious why
wp_privace_delete_old_export_files
usesunlink()
instead ofwp_delete_file()
?
That's a great catch, it probably should. Do you want to submit a patch for that? I think there are a few other instances of using unlink()
directly that should be changed too.
#58
@
6 years ago
Sorry…diff file for another ticket. 😳
I’m not that familiar with SVN. Is there much difference between a diff and a patch file, and which is preferred for WP core?
#59
follow-up:
↓ 60
@
6 years ago
Hey @macbookandrew, diff and patch files are the same thing. Also, you are welcome to use git to generate the diff if you prefer.
#60
in reply to:
↑ 59
@
6 years ago
Replying to aaroncampbell:
Hey @macbookandrew, diff and patch files are the same thing. Also, you are welcome to use git to generate the diff if you prefer.
@aaroncampbell Thanks!
Is there any way to delete this diff file: https://core.trac.wordpress.org/attachment/ticket/43546/43933.13.diff ? I accidentally submitted the wrong one.
#61
@
6 years ago
I went ahead and deleted it for you (deleting attachments requires additional Trac permissions), but generally it's not a big deal to leave the mistaken file on a ticket as long as it didn't have private info in it or something. Adding a note pointing out the mistake so people aren't confused is usually enough 🙂
#63
follow-up:
↓ 64
@
6 years ago
Is it a bad idea to have the full, publicly-available ZIP file link in an email? My first thought is that the direct link to the ZIP file shouldn't ever automatically leave/exist-outside-of a logged-in WP session.
Perhaps I've missed security talk that already took place on this, so if I did I apologize. But could it be that a user has to be logged-into WordPress in order to get the zip file link?
Could there also be a link in the downloaded index.html file that the user could click to day "I got it", which would delete the file from the server immediately, ahead of the cron?
#64
in reply to:
↑ 63
@
6 years ago
Replying to johnstonphilip:
Is it a bad idea to have the full, publicly-available ZIP file link in an email? My first thought is that the direct link to the ZIP file shouldn't ever automatically leave/exist-outside-of a logged-in WP session.
The discussion about that was across comment:14, comment:25, comment:32, and comment:34.
The filename contains a lengthy cryptographically-secure psuedorandom number, which makes discovery via brute force impossible for all practical considerations (see comment:34 for the details). I think that's essentially how Google Docs, YouTube, etc protect documents that are intended to be shared with a limited number of people, but where account authorization is not practical.
One of the reasons it isn't behind a login flow is because it has to serve people who don't have accounts, like commenters. We could potentially require logged-in users to log in, though, and only have it publicly accessible for commenters. That would require some kind of proxy like ms-files.php
, though. So, it's not trivial, but doable if it seems like it's worth it.
I personally don't feel like there's enough reason to worry, but I'd like to hear what you think after reading the comments above.
Could there also be a link in the downloaded index.html file that the user could click to day "I got it", which would delete the file from the server immediately, ahead of the cron?
That's a good idea. Similar to the above, I don't personally feel like it's necessary, but it might be easier than the ms-files.php
proxy. So, if we decide we need additional hardening, that might be a better option.
It's also worth noting that people can use the wp_privacy_export_expiration
filter if they want to shrink the attack window, although that would have a negative impact on UX.
What do you think about all of that?
#65
follow-up:
↓ 74
@
6 years ago
@iandunn Thanks for the response and explanations - and catching me up. My fear with the link being in the email (or anywhere outside of a logged-in WP session) is how simple it is for someone to say, accidentally forward an email. Or say they copy the link and forget it's in their clipboard when they paste into an exposed place (these are just the first 2 examples that popped into my head).
I just don't love the potential problems that open up when the link is automatically exposed in-full outside of a logged-in WordPress session.
It's much tougher to accidentally give someone access to your account, especially if that account is protected by double authentication.
For my own websites, I would love the option of requiring the requester to have an account in order to request/delete their data. For guest commenters, I would require they sign up for an account prior to making a data request. I realize that's not an approach everyone would take, but I think it's a reasonable one that certain sites with sensitive PII would agree with. I would prefer to make guests jump through a couple extra hoops for security, as opposed to reducing security overall for my registered users just to accommodate guests. I'm sure this is especially true for sites that don't allow guests to comment.
Regarding the removal of files, I agree that shrinking the attack window isn't the best for UX as it's hard to know how long a file takes to download on slow connections and slow hosts. It would be ideal to have the file be automatically deleted when the download has completed. Obviously that is difficult to know, but when the index.html file is opened, the customer could let us know.
#66
@
6 years ago
It would be ideal to have the file be automatically deleted when the download has completed. Obviously that is difficult to know, but when the index.html file is opened, the customer could let us know.
Possibly add an image to the export HTML with the src set to https://my-site.com/wp-json/privacy/delete-personal-data-file/z6S0kil2OGtEkdWkCwWL6xMz8cd8suve or something similar? The REST API receiving the request could serve a blank PNG/GIF and immediately delete the file.
However, I can see how that might be construed as collecting more data on a user after they requested an export, so maybe not the best idea? Just thinking out loud.
#67
@
6 years ago
If we could make sure the link was a 1 time use in some way, I think that would ease the concern on the link in the email.
What if the link sent in the email was a masked URL that only worked once? The actual URL to the zip file wouldn't be exposed, and if the masked URL were copied/subsequently clicked, it wouldn't work.
@macbookandrew I like that idea as well. Though, like you said it could be construed as collecting more data - so tough!
#68
@
6 years ago
I dislike this, downloads can be incomplete or corrupted, giving you only parts of your content, if we delete it on open it's then a matter of restarting the whole process which is a bad experience for the user.
#69
@
6 years ago
I'll aggree with @Clorith there are many cases that a file can't be deleted on action
as there are many cases that it will fail. The only way I see is for them to be retained for X time and that the users have to be informed about that. Maybe that's an extra 'option' for each admin to choose in case he wants to keep the files available for 1 day, 2 days, 1 week etc.
#70
@
6 years ago
@Clorith My initial thought is that if one opens the index file, which lives inside a zip file, that shows the download was complete and unzipped successfully. Is that not the case? Perhaps I mis-understand the way the downloading process works here.
Regarding the initial point about being logged in, I'm still on the fence about whether the link should be one-time-use, or require a login. Could it be an optional setting to require login for request/download/delete for websites who want to approach this from that perspective?
#71
@
6 years ago
A file may be downloaded, and you can open them, but they may still be incomplete or corrupted in some way, this is especially true for archives where one file might be fine, but another might not (I'm thinking edge cases, I know, but it's definitely not unheard of).
I would argue that implementing further behaviors around export downloads might be prime territory for plugins to enter the playing field, there's so many ifs and ors here for what various sites may wish to do it's not an easy one for us to cover it all.
#72
@
6 years ago
That's fair. I just want to make sure that WP core is taking security as seriously as necessary here. It's my opinion that security often trumps usability, especially when people's sensitive personal data is involved and breaches are so problematic.
We won't send a password in an email, but we will send all of a person's personal information. To me, that's a strange thing.
If ya'll feel the current system is good enough, people concerned about the security in this can turn to plugins - that's fine. Personally I feel the current system does not go far enough for the average store, which is why I'm speaking up here now.
To be clear, this is not a personal attack on anyone, it's just what I feel is a fact.
This ticket was mentioned in Slack in #gdpr-compliance by allendav. View the logs.
6 years ago
#74
in reply to:
↑ 65
@
6 years ago
Replying to johnstonphilip:
My fear with the link being in the email (or anywhere outside of a logged-in WP session) is how simple it is for someone to say, accidentally forward an email. Or say they copy the link and forget it's in their clipboard when they paste into an exposed place
By an "exposed place", do you mean a public terminal, like a school/library/etc, where someone else could use it after the WP user leaves, and still access the clipboard? Or, did you mean that the user pastes the URL into an input field on a malicious website, thinking that the clipboard contained some other value?
I guess in either case, all 3 of those scenarios feel like edge cases to me, and outside of WP's direct control. I do think it's good for us to consider any reasonable steps we can make to help avoid human error, even if we're not directly responsible, but that has to be balanced against adding complexity to the code, and the opportunity cost of saying "no" to other things, in order to spend the time on this.
I'd personally lean towards saying that, at least for now, this feels like plugin territory to me. Even if that's the consensus, though, that doesn't mean it won't ever be included in Core. If those plugins become popular enough to show that there's a strong interest in them, or if we discover a scenario that's more practical, or directly under WP's control, then we can always consider hardening efforts in Core.
( If anyone does try to write a plugin, and ends up needing some hooks in order to do it, we can always add those too )
To be clear, this is not a personal attack on anyone, it's just what I feel is a fact.
No worries at all, I think you've done a great job of articulating your opinion while being respectful. You're always welcome to express a dissenting opinion; that's crucial to make sure that we as a community don't miss things that hadn't been considered. Even if we don't ultimately make any changes in Core, I think the discussion is beneficial.
#75
@
6 years ago
Sounds good, thanks Ian.
For my purposes I'll probably use the wp_privacy_personal_data_email_content hook to filter out the ###LINK###, which I could then replace with a link to log in, so that should work I think.
#76
@
6 years ago
Is it possible to add an additional var to the wp_privacy_personal_data_export_file_created
action called after .zip file generation?
Namely I'm looking to add expose the $email_address
variable from the original export request at the top of the wp_privacy_generate_personal_data_export_file()
function.
I see that the scenario mentioned above would be to password protect the .zip file but I've come up with another scenario that would be immediately useful to my plugin (but I'm hoping would be generally useful as well).
My plugin allows generation of downloadable certificates which likely contain personal information. I want to include these downloads in the generated export file but I do not want to put them in a publicly accessible directory (like a subdirectory in the uploads directory). This is I think fine for things like images which are already publicly accessible but since my plugin's certificates contain personal information I don't feel it wise to put them there even if they file name is obscured.
If the email address is passed in this hook I can add the actual certificate files to the export very easily and allow them to remain completely non-public.
I know it's late and the RC is closing in but I think this is a small addition which would have little negative consequence (that I can think of immediately at least.
Thanks!
This ticket was mentioned in Slack in #gdpr-compliance by thomasplevy. View the logs.
6 years ago
This ticket was mentioned in Slack in #core by desrosj. View the logs.
6 years ago
#79
@
6 years ago
- Priority changed from normal to low
Set priority to low for tickets that only need unit tests.
This ticket was mentioned in Slack in #core by desrosj. View the logs.
6 years ago
#81
@
6 years ago
43546.wpPrivacySendPersonalDataExportEmail.diff adds tests for the function:
wp_privacy_send_personal_data_export_email()
#86
@
6 years ago
- Milestone changed from 4.9.8 to 4.9.6
- Resolution set to fixed
- Status changed from assigned to closed
Talked closing this one out over with @johnbillion. Creating a new ticket helps keep the enhancement that was added with 4.9.6 and makes it clearer when tests were added and avoids continuously punting this ticket.
I opened #44233 and moved the uncommitted patches over to that ticket to wrap up and add other missing tests.
Patch forthcoming. Awaiting #43481 to land.