Opened 6 years ago
Last modified 6 years ago
#44236 assigned defect (bug)
Maintain consistency between privacy export report and archive filenames
Reported by: | iandunn | Owned by: | GripsArt |
---|---|---|---|
Milestone: | Future Release | Priority: | low |
Severity: | minor | Version: | 4.9.6 |
Component: | Privacy | Keywords: | needs-testing has-patch needs-unit-tests needs-refresh |
Focuses: | Cc: |
Description
Prior to r43180, the export .zip
filename changed every time it was regenerated; after that commit, it was reused to ensure links aren't broken (see #43905).
That commit didn't do anything to the .html
filename, though, so after the initial regeneration, the value of $obscura
will not match between the two filenames. I don't think this presents any issues for Core, and might not for plugins either, but it seems like they should be consistent, just to be safe.
If a plugin hooks into wp_privacy_personal_data_export_file_created
, it may reasonably assume that the filenames to match, and do something like str_replace( '.html', '.zip', $foo )
. That seems unlikely, since both filenames are passed to the callback, but it's better to be safe than sorry.
Attachments (5)
Change History (33)
This ticket was mentioned in Slack in #gdpr-compliance by iandunn. View the logs.
6 years ago
This ticket was mentioned in Slack in #core-privacy by desrosj. View the logs.
6 years ago
#4
@
6 years ago
- Keywords needs-refresh added
- Milestone changed from Awaiting Review to 4.9.9
- Owner set to GripsArt
- Status changed from new to assigned
Assigning to @GripsArt to mark good-first-bug
claimed.
Patch still applies and it works as intended. Tested with a fresh request, and re-downloading a data file for a previously completed request. They both change the export file name correctly.
I think that it is reasonable to also apply a filter on the file name to allow plugins to change it. @GripsArt are you able to update the patch?
This ticket was mentioned in Slack in #core by garrett-eclipse. View the logs.
6 years ago
#7
@
6 years ago
- Keywords needs-refresh added; has-patch needs-testing removed
Thanks @GripsArt
Posting this here so it's not overlooked from slack.
when adding a filter or hook make sure it gets a docblock
see https://make.wordpress.org/core/handbook/best-practices/inline-documentation-standards/php/#4-hooks-actions-and-filters
#9
@
6 years ago
- Keywords needs-refresh added; needs-testing removed
Thanks @GripsArt
Only minor change is the param should be for $file_basename as $html_report_filename is the return, so the docblock param would be;
@param string $file_basename the personal data export file basename.
Looks good to me otherwise, appreciate the refresh
#12
@
6 years ago
- Keywords needs-refresh needs-unit-tests added
@GripsArt good progress on this. Thanks for working on it. I have a few suggestions.
The value being filtered is $file_basename
, so the result of the filter should be saved to that variable. The $file_basename
variable is used further down in the function to name the ZIP file, but with the filter where it is in 44263.3.patch, the ZIP file name would not match if the value is filtered. It's also best to not perform other operations on the same line that a filter is applied to be clearer.
Nitpicky things:
- Inline documentation descriptions should always end with a period.
- There is an extra space between
string
and$file_basename
in the@param
tag. the
inthe personal data export
should be capitalized.
I also would like these changes should be accompanied by unit tests, but they would need to be added to the test class being worked on in #44233.
#13
@
6 years ago
- Keywords needs-refresh removed
@desrosj changed filter position, operations in filter, filter name and comment.
#14
@
6 years ago
Reviewing 44263.4.patch:
Some cases to consider for the suggested wp_personal_export_file_basename
filter:
- The filter can add characters not supported by the file system.
- The filter can add non-unique file names and override previous files.
- If the filter returns an empty string, the file becomes:
.html
.
If the filter is going to be added, then I would suggest adding more input arguments, like:
$file_basename = apply_filters( 'wp_personal_export_file_basename', $file_basename, $stripped_email, $obscura, $request_id );
and also move this line:
$file_basename = 'wp-personal-data-file-' . $stripped_email . '-' . $obscura;
above the inline filter's documentation.
There's also a missing full-stop end for the filter's inline description.
ps:
This filter reminds of the export_wp_filename
:
$wp_filename = $sitename . 'WordPress.' . $date . '.xml'; /** * Filters the export filename. * * @since 4.4.0 * * @param string $wp_filename The name of the file for download. * @param string $sitename The site name. * @param string $date Today's date, formatted. */ $filename = apply_filters( 'export_wp_filename', $wp_filename, $sitename, $date );
apart from that it's the name for the download file, saved on the users disk and not on the server. So I think more restrictions would be needed for the latter case.
pps: I wonder if we should skip the filter (for now) to increase the change of getting this into the 4.9.9 ?
This ticket was mentioned in Slack in #core-privacy by desrosj. View the logs.
6 years ago
#16
@
6 years ago
before sending a new patch, i want discuss these three points:
- The filter can add characters not supported by the file system.
- The filter can add non-unique file names and override previous files.
- If the filter returns an empty string, the file becomes: .html.
i talked with some other devs on the contributer day in Würzburg, we think it is not the job of a filter to protect for bad using from plugins/themes.
This ticket was mentioned in Slack in #core-privacy by gripsart. View the logs.
6 years ago
#18
@
6 years ago
@GripsArt Glad to hear you discussed this on contributor day.
If I understand you correctly, then you're suggesting that the filename should not be sanitized?
Allowing such filtering of file names, without restrictions, then plugins modifying it would e.g. have to consider:
- Avoid creating files with unsupported chars, e.g. that cannot be deleted?
- Avoid possible collisions of file names.
- Avoid making it easier to guess the filenames.
- ...
There comes a risk with all filters, but I'm just trying to better understand the possible dangers here if it becomes open to modification :-) Maybe to have it documented in the filter's description, if added.
I'm wondering what could be the use case of such a filter?
Maybe removing the email address from it?
#19
@
6 years ago
the use case...
not shure why a plugin or theme should be rename this files.
@desrosj what was your use case?
#20
@
6 years ago
A company could have custom naming requirements from an organizational standpoint. For example, an organization may desire their file names to follow a pattern of %customer_id_number%.zip
. I don't think adding a filter here is a big deal, but it could be useful for many.
I think that the result of the filter should be sanitized with sanitize_file_name()
for safety. Usually, it is desired to let filtered values cause errors to make them easier to detect and debug, allowing unsanitized values here could be risky.
Avoid making it easier to guess the filenames.
For this, maybe we do not pass along the $obscura
value to the filter and always tack it to the end.
If the filter returns an empty string, the file becomes: .html.
This should be checked for.
All these things considered would result in something like this:
$default_basename = 'wp-personal-data-file-' . $stripped_email; $file_basename = apply_filters( 'wp_personal_export_file_basename', $default_basename ); $file_basename = sanitize_file_name( $file_basename ); if ( empty( $file_basename ) ) { $file_basename = $default_basename; } $html_report_filename = $file_basename . '-' . $obscura . '.html';
By tacking on the $obscura
value after the basename is filtered, it prevents the possibility of a name collision ($obscura
being 32 characters long, there are 6232 possibilities). It also continues to follow the standard introduced in 4.9.6.
Some other feedback:
- The filter's secondary description should specify that the file extension should not be included in the returned value. If the
$obscura
value is moved to after the filter as shown in my example above, that should be noted as well. - I agree with @birgire about adding other parameters to the filter. I think that
$stripped_email
, and$request_id
in addition to the filtered value is sufficient.
#21
@
6 years ago
To avoid potential collisions aside from use of $obscura
would the use of wp_unique_filename
make sense as it does a file_exists
check.
It also supports custom callback function which can be used to latch customizations for this implementation.
Codex - https://codex.wordpress.org/Function_Reference/wp_unique_filename
I'm not sure if it can be used directly but at the least provides existing examples and a file_exists
loop concept which would ensure no collisions.
Also as this is adding some docblock for the filter whatever the implementation would probably deserve a separate function so the code within wp_privacy_generate_personal_data_export_file
to get $file_basename
can reduce to a single line $file_basename = wp_unique_privacy_filename(.....)
.
All the other progress and feedback has been great thanks @GripsArt , @birgire & @desrosj
#22
@
6 years ago
I avoided using wp_unique_filename()
because in its default state, it could unintentionally expose user data. For example, if I receive an export file called my-email-at-email-host-com-abcdefghijklmnop-2.zip
, I can assume that -1.zip
and .zip
versions of this file exist (or did exist at one point) and visit those URLs directly. wp_unique_filename()
has a third parameter, callback, that should be used to replace the $obscura
value in the filename instead of incrementing the file by number.
If this part of the logic is broken out into its own function, I think that the logic for that could be included in the new function instead of the callback parameter of wp_unique_filename()
.
#23
@
6 years ago
That makes sense thanks @desrosj
So either a completely unique function or if wp_unique_filename
is used it would be done with a callback that would almost replicate the incrementation but instead use random $obscura
value in the while loop checking file_exists
, and in essence would do everything this chunk of code does except randomize instead of increment.
https://github.com/WordPress/WordPress/blob/06b2e1776c017717eb083cc4568882507bb6f013/wp-includes/functions.php#L2254-L2290
#24
@
6 years ago
Here's a radical idea: to avoid any unintended expectations that the obscura should match between the intermediate HTML file and the final ZIP file, would it be even more appropriate to change the code to ALWAYS use two different obscura for those files?
Also a fan of allowing plugins to filter both of these filenames.
Changed hardcoded "index.html" to $html_report_filename