WordPress.org

Make WordPress Core

Opened 21 months ago

Closed 2 weeks ago

#44197 closed defect (bug) (fixed)

ZIP file containing a user’s personal data has user’s personal data in filename

Reported by: Ov3rfly Owned by: garrett-eclipse
Milestone: 5.4 Priority: normal
Severity: normal Version: 4.9.6
Component: Privacy Keywords: has-patch commit has-unit-tests
Focuses: Cc:
PR Number:

Description

Example from wp-content/uploads/wp-personal-data-exports/

  • wp-personal-data-file-info-at-example-com-1RwxnSYi7z...SZGjD6shoOc.zip

The email info@example.com can be clearly identified within the filename.

Hosting providers worldwide: Work literally day & night to provide anonymization of personal user data like IP address in access logs etc. for gdpr-compliance.

WordPress in a core privacy feature: HMB, let's put personal user data in a filename for personal user data.

Why this isn't a good idea in terms of gdpr and otherwise, incomplete list:

  • While user email usually can be seen only in server database, now it can be seen in server filesystem
  • During download the filename is stored in access logs of the server and ..
  • .. in load balancer and firewall logs
  • .. in proxy server logs
  • .. in automated virus checking logs on proxy servers
  • .. in automated virus checking logs on client
  • .. in client browser history
  • .. in client filesystem
  • .. in client cloud backups
  • ..
  • After download has expired and user tries to re-download using the expired link ..
  • .. the normal WordPress 404 page is triggered and the filename ends up ..
  • .. in logs and/or storage of 404 handling plugins
  • .. in trackers like Google Analytics or similar
  • .. in referer logs of any third party content on 404 page
  • .. in page url accessable to third party content on 404 page
  • ..

Current Behaviour:

  • wp-personal-data-file-[email]-[random].zip

Expected Behaviour:

  • wp-personal-data-file-[hash of email]-[random].zip

Note: Would not suggest to use MD5 for hashing, otherwise many emails still could be revealed with minimal effort similar to Gravatar user emails.

Attachments (5)

request-hover.PNG (14.2 KB) - added by Clorith 21 months ago.
44197.diff (885 bytes) - added by donmhico 3 weeks ago.
44197.2.diff (912 bytes) - added by donmhico 3 weeks ago.
44197.3.diff (975 bytes) - added by garrett-eclipse 3 weeks ago.
Refresh to use wp_unique_filename to avoid potential collisions
44197.4.diff (3.5 KB) - added by garrett-eclipse 3 weeks ago.
Updated Unit Tests to remove email info from test filenames.

Download all attachments as: .zip

Change History (26)

#1 @allendav
21 months ago

Your point is well taken. The email address was included in the filename to help administrators avoid incorrectly picking the wrong file if they needed to use the download flow (instead of the email flow) to send the user their data (e.g. if they needed to combine it with exported data from a non-participating plugin.)

#2 @allendav
21 months ago

The hash of email is interesting, but would raise the possibility of the admin choosing the wrong file to send to a user (which would be bad).

Perhaps there is another way we could help administrators not confuse multiple exports?

#3 follow-up: @Clorith
21 months ago

The export screen does provide a Download personal data link once it's been generated, that would be sufficient if we surface it better, perhaps adding it as a button to the action side of the screen?

#4 @allendav
21 months ago

Another idea: Don't send a direct link to the user in their email at all, but a link which kicks off a download of the export file. The link should include a nonce. That way we could perhaps continue to use the email address in the filename served to the administrator?

#5 in reply to: ↑ 3 @allendav
21 months ago

Replying to Clorith:

The export screen does provide a Download personal data link once it's been generated, that would be sufficient if we surface it better, perhaps adding it as a button to the action side of the screen?

I'm sorry Clorith - I don't follow. Could you elaborate?

#6 @Clorith
21 months ago

In request-hover.PNG I'm hovering over a request, and being presented wit ha download link under the requesting email, we could surface that on the right side, that would mean a hashed filename isn't blocking any manual procedure you might foresee.

#7 @desrosj
20 months ago

  • Keywords gdpr removed

Removing the GDPR keyword. This has been replaced by the new Privacy component and privacy focuses in Trac.

This ticket was mentioned in Slack in #core-privacy by garrett-eclipse. View the logs.


9 months ago

#9 @xkon
5 weeks ago

  • Keywords needs-patch added
  • Milestone changed from Awaiting Review to 5.4

I do agree with @Ov3rfly on this and as @Clorith mentions also it's easy to download the file via the admin.

I'm not aware of how many would actually go to the physical location of the files to manually download the zip and forward it. Since the option of downloading already exists within the admin I don't see a reason to do so personally.

In this case, in my opinion, the advantages of removing the email from the filename are way more than just the easiness of finding a .zip via the filesystem directly.

Also since we're also changing the paths on #44038 it might be a good idea to tackle both :). I'll mark this for 5.4 in case we can manage to move this forward within time.

@donmhico
3 weeks ago

#10 @donmhico
3 weeks ago

  • Keywords has-patch added; needs-patch removed

Thanks for the report @Ov3rfly. 44197.diff removes the email address part in the export zip file name.

Regarding the concern of mis sending incorrect file by the admin manually. IMHO, the whole point of anonymization is to prevent us from relating the data to a specific user, which is basically rendered nil if the admin can "find" the correct export zip for a user.

Like @Clorith pointed out it may be better to provide better UI for the "Download Personal Data" link so the Admin knows and encouraged to download it via the Dashboard instead of manually looking into it.

#11 @xkon
3 weeks ago

  • Keywords needs-refresh added

Hey @donmhico, thanks for the patch!

Since the $stripped_email is completely removed on your patch and not even used for an extra hash then I think that its assignment references that are on the same block can also be removed.

$stripped_email       = str_replace( '@', '-at-', $email_address );
$stripped_email       = sanitize_title( $stripped_email ); // slugify the email address

As side notes, most likely phpunit tests will also need updating due to this name change and even though the exporter will first check out if a zip exists attached to the existing request and re-use that name it would be good to cross-test that to avoid breaking any existing sent e-mails :).

Would you like to update the patch and take a look at unit tests as well?

#12 @donmhico
3 weeks ago

@xkon - Make sense, can't believe I missed that out. Thanks for pointing it out. I'll update the patch soon.

#13 @Ov3rfly
3 weeks ago

Note: Without an extra hash there is a theoretical possibility of filename collision.

@donmhico
3 weeks ago

#14 @donmhico
3 weeks ago

  • Keywords needs-refresh removed

@xkon - Updated the patch - 44197.2.diff. I maybe wrong but I don't see any needed changes in the unit tests. I'm looking in https://core.trac.wordpress.org/browser/trunk/tests/phpunit/tests/privacy/wpPrivacyGeneratePersonalDataExportFile.php. The full php test suite is successful as well. Let me know if i'm missing anything.

And as @Ov3rfly mentioned, there's theoretical possibility of a filename collision. I guess we can perform a file check first then either add an increment in the end of the filename or re-iterate the wp_generate_password() that generates $obscura.

What do you guys think?

#15 @Ov3rfly
3 weeks ago

@donmhico Instead of an extra hash maybe wp_unique_filename(..) could be used .

@garrett-eclipse
3 weeks ago

Refresh to use wp_unique_filename to avoid potential collisions

#16 @garrett-eclipse
3 weeks ago

  • Keywords needs-testing added
  • Owner set to garrett-eclipse
  • Status changed from new to accepted

Thanks for the patch @donmhico and for the feedback @xkon & @Ov3rfly.

I've refreshed the patch in 44197.3.diff to utilize wp_unique_filename to avoid any potential collisions and everything is testing nicely. I also tested pre-existing requests before the change and their exports still function without issue so shouldn't be any back-compat complications.

Note: Existing files which hold email in the name will be left for the administrator to cleanup.

P.S. @donmhico I checked the PHP Unit Tests and will second that this change hasn't caused any issues with existing tests.

I think is looks good to move forward, if anyone wants to give it a final test before marking for commit.

#17 @xkon
3 weeks ago

  • Keywords needs-testing removed

44197.3.diff looks good to me and didn't see any hiccups while testing also so I'm +1 for commit tag.

Regarding PHPUnit that I mentioned we're using hardcoded names on some occasions.

See \tests\phpunit\tests\privacy\wpPrivacyDeleteOldExportFiles.php lines 59-60

self::$expired_export_file = $exports_dir . 'wp-personal-data-file-user-at-example-com-0123456789abcdef.zip';
self::$active_export_file  = $exports_dir . 'wp-personal-data-file-user-at-example-com-fedcba9876543210.zip';

\tests\phpunit\tests\privacy\wpPrivacyProcessPersonalDataExportPage.php line 134

self::$export_file_url      = wp_privacy_exports_url() . 'wp-personal-data-file-requester-at-example-com-Wv0RfMnGIkl4CFEDEEkSeIdfLmaUrLsl.zip';

tests\phpunit\tests\privacy\wpPrivacySendPersonalDataExportEmail.php line 107

$archive_url = wp_privacy_exports_url() . 'wp-personal-data-file-requester-at-example-com-Wv0RfMnGIkl4CFEDEEkSeIdfLmaUrLsl.zip';

It would be good to update these also just for the sake of the new naming convention and to avoid confusion :-).

And I hope I didn't miss/forget anything else in there 😅.

Care for a quick update @garrett-eclipse?

@garrett-eclipse
3 weeks ago

Updated Unit Tests to remove email info from test filenames.

#18 @garrett-eclipse
3 weeks ago

Thanks @xkon for the flag, updated those in 44197.4.diff. Take another look and we can move this forward. Cheers

#19 @donmhico
3 weeks ago

Awesome! Thanks for the new patch @garrett-eclipse. I re-tested and still works. +1 for commit tag.

#20 @xkon
3 weeks ago

  • Keywords commit has-unit-tests added

Thanks for the phpunit update @garrett-eclipse :-). Marking this for commit.

#21 @SergeyBiryukov
2 weeks ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 47144:

Privacy: Remove user's email address from personal data export filename.

Use wp_unique_filename() to avoid potential collisions instead.

Props xkon, garrett-eclipse, donmhico, Ov3rfly, Clorith, allendav.
Fixes #44197.

Note: See TracTickets for help on using tickets.