Make WordPress Core

Opened 6 years ago

Last modified 5 years ago

#44236 assigned defect (bug)

Maintain consistency between privacy export report and archive filenames

Reported by: iandunn's profile iandunn Owned by: gripsart's profile GripsArt
Milestone: Future Release Priority: low
Severity: minor Version: 4.9.6
Component: Privacy Keywords: needs-testing has-patch needs-unit-tests needs-refresh
Focuses: Cc:

Description

Prior to r43180, the export .zip filename changed every time it was regenerated; after that commit, it was reused to ensure links aren't broken (see #43905).

That commit didn't do anything to the .html filename, though, so after the initial regeneration, the value of $obscura will not match between the two filenames. I don't think this presents any issues for Core, and might not for plugins either, but it seems like they should be consistent, just to be safe.

If a plugin hooks into wp_privacy_personal_data_export_file_created, it may reasonably assume that the filenames to match, and do something like str_replace( '.html', '.zip', $foo ). That seems unlikely, since both filenames are passed to the callback, but it's better to be safe than sorry.

Attachments (5)

44236.patch (541 bytes) - added by GripsArt 6 years ago.
Changed hardcoded "index.html" to $html_report_filename
44236.1.patch (1.6 KB) - added by GripsArt 6 years ago.
added filter to the filename
44263.2.patch (2.6 KB) - added by GripsArt 6 years ago.
added comments for filter
44263.3.patch (2.6 KB) - added by GripsArt 6 years ago.
changed docblock param as discussed
44263.4.patch (2.8 KB) - added by GripsArt 6 years ago.

Download all attachments as: .zip

Change History (33)

This ticket was mentioned in Slack in #gdpr-compliance by iandunn. View the logs.


6 years ago

@GripsArt
6 years ago

Changed hardcoded "index.html" to $html_report_filename

#2 @GripsArt
6 years ago

  • Keywords has-patch added

This ticket was mentioned in Slack in #core-privacy by desrosj. View the logs.


6 years ago

#4 @desrosj
6 years ago

  • Keywords needs-refresh added
  • Milestone changed from Awaiting Review to 4.9.9
  • Owner set to GripsArt
  • Status changed from new to assigned

Assigning to @GripsArt to mark good-first-bug claimed.

Patch still applies and it works as intended. Tested with a fresh request, and re-downloading a data file for a previously completed request. They both change the export file name correctly.

I think that it is reasonable to also apply a filter on the file name to allow plugins to change it. @GripsArt are you able to update the patch?

@GripsArt
6 years ago

added filter to the filename

This ticket was mentioned in Slack in #core by garrett-eclipse. View the logs.


6 years ago

#6 @GripsArt
6 years ago

  • Keywords needs-testing added; needs-refresh removed

#7 @garrett-eclipse
6 years ago

  • Keywords needs-refresh added; has-patch needs-testing removed

Thanks @GripsArt
Posting this here so it's not overlooked from slack.

when adding a filter or hook make sure it gets a docblock
see https://make.wordpress.org/core/handbook/best-practices/inline-documentation-standards/php/#4-hooks-actions-and-filters

@GripsArt
6 years ago

added comments for filter

#8 @GripsArt
6 years ago

  • Keywords needs-testing added; needs-refresh removed

#9 @garrett-eclipse
6 years ago

  • Keywords needs-refresh added; needs-testing removed

Thanks @GripsArt

Only minor change is the param should be for $file_basename as $html_report_filename is the return, so the docblock param would be;
@param string $file_basename the personal data export file basename.

Looks good to me otherwise, appreciate the refresh

@GripsArt
6 years ago

changed docblock param as discussed

#10 @GripsArt
6 years ago

  • Keywords needs-testing added; needs-refresh removed

#11 @garrett-eclipse
6 years ago

  • Keywords has-patch added

Thanks for the change, it's looking good to me.

#12 @desrosj
6 years ago

  • Keywords needs-refresh needs-unit-tests added

@GripsArt good progress on this. Thanks for working on it. I have a few suggestions.

The value being filtered is $file_basename, so the result of the filter should be saved to that variable. The $file_basename variable is used further down in the function to name the ZIP file, but with the filter where it is in 44263.3.patch, the ZIP file name would not match if the value is filtered. It's also best to not perform other operations on the same line that a filter is applied to be clearer.

Nitpicky things:

I also would like these changes should be accompanied by unit tests, but they would need to be added to the test class being worked on in #44233.

@GripsArt
6 years ago

#13 @GripsArt
6 years ago

  • Keywords needs-refresh removed

@desrosj changed filter position, operations in filter, filter name and comment.

Last edited 6 years ago by GripsArt (previous) (diff)

#14 @birgire
6 years ago

Reviewing 44263.4.patch:

Some cases to consider for the suggested wp_personal_export_file_basename filter:

  • The filter can add characters not supported by the file system.
  • The filter can add non-unique file names and override previous files.
  • If the filter returns an empty string, the file becomes: .html.

If the filter is going to be added, then I would suggest adding more input arguments, like:

$file_basename = apply_filters(
    'wp_personal_export_file_basename',
    $file_basename,
    $stripped_email,
    $obscura,
    $request_id
);

and also move this line:

$file_basename            = 'wp-personal-data-file-' . $stripped_email . '-' . $obscura;

above the inline filter's documentation.

There's also a missing full-stop end for the filter's inline description.

ps:

This filter reminds of the export_wp_filename:

$wp_filename = $sitename . 'WordPress.' . $date . '.xml';
/**
 * Filters the export filename.
 *
 * @since 4.4.0
 *
 * @param string $wp_filename The name of the file for download.
 * @param string $sitename    The site name.
 * @param string $date        Today's date, formatted.
 */
 $filename = apply_filters( 'export_wp_filename', $wp_filename, $sitename, $date );

apart from that it's the name for the download file, saved on the users disk and not on the server. So I think more restrictions would be needed for the latter case.

pps: I wonder if we should skip the filter (for now) to increase the change of getting this into the 4.9.9 ?

Last edited 6 years ago by birgire (previous) (diff)

This ticket was mentioned in Slack in #core-privacy by desrosj. View the logs.


6 years ago

#16 @GripsArt
5 years ago

before sending a new patch, i want discuss these three points:

  • The filter can add characters not supported by the file system.
  • The filter can add non-unique file names and override previous files.
  • If the filter returns an empty string, the file becomes: .html.

i talked with some other devs on the contributer day in Würzburg, we think it is not the job of a filter to protect for bad using from plugins/themes.

@birgire
@desrosj

Last edited 5 years ago by GripsArt (previous) (diff)

This ticket was mentioned in Slack in #core-privacy by gripsart. View the logs.


5 years ago

#18 @birgire
5 years ago

@GripsArt Glad to hear you discussed this on contributor day.

If I understand you correctly, then you're suggesting that the filename should not be sanitized?

Allowing such filtering of file names, without restrictions, then plugins modifying it would e.g. have to consider:

  • Avoid creating files with unsupported chars, e.g. that cannot be deleted?
  • Avoid possible collisions of file names.
  • Avoid making it easier to guess the filenames.
  • ...

There comes a risk with all filters, but I'm just trying to better understand the possible dangers here if it becomes open to modification :-) Maybe to have it documented in the filter's description, if added.

I'm wondering what could be the use case of such a filter?

Maybe removing the email address from it?

#19 @GripsArt
5 years ago

the use case...

not shure why a plugin or theme should be rename this files.

@desrosj what was your use case?

#20 @desrosj
5 years ago

A company could have custom naming requirements from an organizational standpoint. For example, an organization may desire their file names to follow a pattern of %customer_id_number%.zip. I don't think adding a filter here is a big deal, but it could be useful for many.

I think that the result of the filter should be sanitized with sanitize_file_name() for safety. Usually, it is desired to let filtered values cause errors to make them easier to detect and debug, allowing unsanitized values here could be risky.

Avoid making it easier to guess the filenames.

For this, maybe we do not pass along the $obscura value to the filter and always tack it to the end.

If the filter returns an empty string, the file becomes: .html.

This should be checked for.

All these things considered would result in something like this:

$default_basename = 'wp-personal-data-file-' . $stripped_email;
$file_basename    = apply_filters( 'wp_personal_export_file_basename', $default_basename );
$file_basename    = sanitize_file_name( $file_basename );

if ( empty( $file_basename ) ) {
	$file_basename = $default_basename;
}

$html_report_filename = $file_basename . '-' . $obscura . '.html';

By tacking on the $obscura value after the basename is filtered, it prevents the possibility of a name collision ($obscura being 32 characters long, there are 6232 possibilities). It also continues to follow the standard introduced in 4.9.6.

Some other feedback:

  • The filter's secondary description should specify that the file extension should not be included in the returned value. If the $obscura value is moved to after the filter as shown in my example above, that should be noted as well.
  • I agree with @birgire about adding other parameters to the filter. I think that $stripped_email, and $request_id in addition to the filtered value is sufficient.

#21 @garrett-eclipse
5 years ago

To avoid potential collisions aside from use of $obscura would the use of wp_unique_filename make sense as it does a file_exists check.

It also supports custom callback function which can be used to latch customizations for this implementation.

Codex - https://codex.wordpress.org/Function_Reference/wp_unique_filename

Function - https://github.com/WordPress/WordPress/blob/06b2e1776c017717eb083cc4568882507bb6f013/wp-includes/functions.php#L2214-L2295

I'm not sure if it can be used directly but at the least provides existing examples and a file_exists loop concept which would ensure no collisions.

Also as this is adding some docblock for the filter whatever the implementation would probably deserve a separate function so the code within wp_privacy_generate_personal_data_export_file to get $file_basename can reduce to a single line $file_basename = wp_unique_privacy_filename(.....).

All the other progress and feedback has been great thanks @GripsArt , @birgire & @desrosj

#22 @desrosj
5 years ago

I avoided using wp_unique_filename() because in its default state, it could unintentionally expose user data. For example, if I receive an export file called my-email-at-email-host-com-abcdefghijklmnop-2.zip, I can assume that -1.zip and .zip versions of this file exist (or did exist at one point) and visit those URLs directly. wp_unique_filename() has a third parameter, callback, that should be used to replace the $obscura value in the filename instead of incrementing the file by number.

If this part of the logic is broken out into its own function, I think that the logic for that could be included in the new function instead of the callback parameter of wp_unique_filename().

#23 @garrett-eclipse
5 years ago

That makes sense thanks @desrosj

So either a completely unique function or if wp_unique_filename is used it would be done with a callback that would almost replicate the incrementation but instead use random $obscura value in the while loop checking file_exists, and in essence would do everything this chunk of code does except randomize instead of increment.
https://github.com/WordPress/WordPress/blob/06b2e1776c017717eb083cc4568882507bb6f013/wp-includes/functions.php#L2254-L2290

#24 @allendav
5 years ago

Here's a radical idea: to avoid any unintended expectations that the obscura should match between the intermediate HTML file and the final ZIP file, would it be even more appropriate to change the code to ALWAYS use two different obscura for those files?

Also a fan of allowing plugins to filter both of these filenames.

#25 @pento
5 years ago

  • Milestone changed from 4.9.9 to Future Release

This ticket was mentioned in Slack in #core-privacy by webdevlaw. View the logs.


5 years ago

#27 @garrett-eclipse
5 years ago

  • Keywords good-first-bug removed

#28 @garrett-eclipse
5 years ago

  • Keywords needs-refresh added
Note: See TracTickets for help on using tickets.