WordPress.org

Make WordPress Core

Opened 7 months ago

Closed 7 months ago

#39875 closed defect (bug) (fixed)

PDF previews overwrite existing images with the same name.

Reported by: gitlost Owned by: joemcgill
Milestone: 4.7.3 Priority: normal
Severity: normal Version: 4.7
Component: Media Keywords: has-patch needs-testing fixed-major
Focuses: Cc:

Description

This issue was raised by Benjamin de Jong on the Make blog post Enhanced PDF Support in WordPress 4.7.

Upload a JPEG "test.jpg". Upload a PDF "test.pdf". The original "test.jpg" is overwritten with the generated PDF preview.

(Note that as the thumbnail sizes probably differ you'll have to click into "test.jpg" to see it's been overwritten.)

Attachments (9)

39875.patch (5.0 KB) - added by gitlost 7 months ago.
Special case pdf preview to avoid overwriting exising JPEGs.
39875.diff (2.7 KB) - added by desrosj 7 months ago.
39875-major.diff (4.4 KB) - added by desrosj 7 months ago.
minimal-us-letter.pdf (739 bytes) - added by gitlost 7 months ago.
A public domain PDF, from https://brendanzagaeski.appspot.com/0004.html, modified to be US Letter size. Its very small file size is handy speed-wise for tests.
39875.2.diff (2.7 KB) - added by desrosj 7 months ago.
39875.2.patch (5.1 KB) - added by desrosj 7 months ago.
Uses wp_unique_filename() with no random string in PDF file previews.
39875.3.patch (5.8 KB) - added by gitlost 7 months ago.
Uses comment:8 change with 39875.patch unit tests.
39875.4.patch (5.2 KB) - added by joemcgill 7 months ago.
39875-fix-tests.diff (668 bytes) - added by joemcgill 7 months ago.

Download all attachments as: .zip

Change History (35)

#1 @desrosj
7 months ago

  • Keywords needs-patch added

Just did some testing locally and I was able to reproduce it using the steps specified.

I tested the reverse scenario by uploading a PDF named test.pdf first followed by uploading a JPG named test.jpg) and the JPG was renamed to test-1.jpg correctly.

Not all thumbnail sizes are overwritten when this bug occurs, but the main issue here is that the original test.jpg file is always overwritten by the PDF preview image.

This ticket was mentioned in Slack in #core-media by desrosj. View the logs.


7 months ago

#3 @desrosj
7 months ago

Also discovered another scenario in my testing. PDF files that were uploaded before 4.7 do not have image previews until an image regeneration script is run, or wp media regenerate in WP-CLI.

When one of those scripts are run, the JPG file with the same name is also overwritten.

Steps to reproduce:

  1. Upload PDF file.
  2. Delete all JPG image previews for that PDF from uploads.
  3. Upload a JPG file with the same name.
  4. Run wp media regenerate --yes in WP-CLI.

This ticket was mentioned in Slack in #core-media by joemcgill. View the logs.


7 months ago

@gitlost
7 months ago

Special case pdf preview to avoid overwriting exising JPEGs.

#5 @mikeschroder
7 months ago

  • Milestone changed from Awaiting Review to 4.7.3

@desrosj
7 months ago

#6 @desrosj
7 months ago

  • Keywords has-patch needs-testing added; needs-patch removed

Most recent patch uses the approach [hammered out in this week's Media meeting (https://wordpress.slack.com/archives/core-media/p1487364216001228).

Borrowed some code from wp_save_image(). For the minor release, there will be some code duplication. I am going to whip up a different patch moving the repeated code into a function for the next major release.

@desrosj
7 months ago

#7 @desrosj
7 months ago

Added first pass for moving the repeated logic into a function for a major release.

#8 @gitlost
7 months ago

Although I can see an argument for doing the unique file name outside the class, and also for adding a PDF marker, I don't see a reason for using a random name. What's the benefit over wp_unique_filename()?

An argument against a random name is that it makes linked thumbnails (especially relevant when #39618 gets fixed) of the PDF brittle - for instance if one had to regenerate thumbnails. Another argument is that it loses all association with the original PDF on a file level. Another argument is that it's non-semantic. Another argument is that it's a pretty violent departure from what's there now and is not very WordPressy, not matching practice elsewhere, even in the instance of "wp-admin/includes/image-edit.php" from which it's culled, which only uses it as a suffix on the file name and has its own reasons and context (eg it knows there will be clashes and needs a pregable "unique" marker) for using such a scheme, not present here. Another argument is that non-deterministic outcomes make writing tests harder.

An argument for I can see is that it busts caching, which is nice, but I'd suggest outside the remit of the ticket.

My preference (if 39875.patch is out) would be something like the following (which uses _pdf rather than -pdf so as not to step on the toes of the PDF Image Generator plugin):

$dirname = dirname( $file ) . '/';
$ext = '.' . pathinfo( $file, PATHINFO_EXTENSION );
$preview_file = $dirname . wp_unique_filename( $dirname, wp_basename( $file, $ext ) . '_pdf.jpg' );
$uploaded = $editor->save( $preview_file, 'image/jpeg' );

Failing that, if a random string is really wanted, maybe put the attachment id in there somewhere...

(As an aside, I'll upload a PDF which could be used instead of "wordpress-gsoc-flyer.pdf" in tests to speed them up a bit. It's from Minimal PDF, modified to be US Letter size, and is public domain.)

@gitlost
7 months ago

A public domain PDF, from https://brendanzagaeski.appspot.com/0004.html, modified to be US Letter size. Its very small file size is handy speed-wise for tests.

This ticket was mentioned in Slack in #core-media by desrosj. View the logs.


7 months ago

This ticket was mentioned in Slack in #core-media by joemcgill. View the logs.


7 months ago

@desrosj
7 months ago

#11 @desrosj
7 months ago

Updated the patch per discussion in this week's dev chat (https://wordpress.slack.com/archives/core-media/p1487877384001584). Leave the original file name intact, but add a random string, and -pdf to the end.

#12 @gitlost
7 months ago

Re patch note that PATHINFO_FILENAME is not UTF-8 safe (same problem as basename() and PATHINFO_BASENAME).

I still see no reason for using a random string.

#13 @dd32
7 months ago

I still see no reason for using a random string.

I can't see a need for it either, lets just use wp_unique_filename() here, in most cases it'll not conflict with anything and be used straight up - but in the event another plugin has created the file, or the user uploaded a conflicting name, it'll cause it to not overwrite something that exists already. -pdf2.jpg would work fine - afterall the filename will be stored in the DB, no code in core is ever going to base decisions off the filesystem alone.

This ticket was mentioned in Slack in #core by jnylen. View the logs.


7 months ago

#15 @jnylen0
7 months ago

  • Owner set to joemcgill
  • Status changed from new to assigned

Per above Slack discussion, handing off to @joemcgill to coordinate sign-off/commit with @mikeschroder.

@desrosj
7 months ago

Uses wp_unique_filename() with no random string in PDF file previews.

This ticket was mentioned in Slack in #core-media by desrosj. View the logs.


7 months ago

#17 @gitlost
7 months ago

There's a typo in the patch - '-pdf.pdf' should be '-pdf.jpg'.

Also why not the code as suggested in comment:8 above instead of a variation using preg_replace()? - which is non-semantic and doesn't seem to add anything and is worse in some (admittedly very) edge cases (and frankly is an anti-pattern whose use should be removed from core - /var/www/my.wordpress/blah_noext anyone?!).

Also the unit test doesn't allow for the PDF marker, and the other tests aren't fixed to clean up properly after themselves.

Following is a patch which just uses the change suggested in comment:8 with -pdf instead of _pdf, and with the unit tests from 39875.patch, adjusted for -pdf and cleaned up a bit and with an extra renaming test.

@gitlost
7 months ago

Uses comment:8 change with 39875.patch unit tests.

@joemcgill
7 months ago

#18 @joemcgill
7 months ago

Thansk @desrosj and @gitlost. I agree that using pathinfo() and wp_basename() is an improvement over preg_replace() in this case. The second unit test in 39875.3.patch looks helpful in terms of checking the logic there, but unless we intend to abstract that out into a new function, I'd rather leave that test out.

39875.4.patch Simplifies the unit tests from 39875.3.patch a bit, but should be identical otherwise. I don't think we need to test that the attachments and metadata were generated correctly in this test, that functionality should be tested elsewhere. If it isn't, we should add additional unit test coverage that is focussed specifically on that functionality.

#19 @gitlost
7 months ago

Looks good to me!

#20 @mikeschroder
7 months ago

I liked the random string for consistency's sake with other resizing, but I'm okay with this approach.

I'd be more comfortable if we stopped accessing /tmp/ directly in tests, but otherwise this looks okay.

Last edited 7 months ago by mikeschroder (previous) (diff)

#21 @joemcgill
7 months ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 40130:

Media: Keep PDF previews from overwriting files.

Since support for PDF previews were added in [38949], it's possible
that the generated image file could overwrite an existing image file
with the same name. This uses wp_unique_filename() to avoid this
issue and adds a '-pdf' identifier on the end of filenames.

Props gitlost, derosj, mikeschroder, joemcgill.
Fixes #39875. See #31050.

#22 @joemcgill
7 months ago

  • Keywords fixed-major added
  • Resolution fixed deleted
  • Status changed from closed to reopened

#23 @joemcgill
7 months ago

I've opened #39975 to address the various places where /tmp/ is used in the tests.

#24 @joemcgill
7 months ago

We need to skip the tests when PDF's aren't supported by the current image editor. See 39875-fix-tests.diff.

#25 @joemcgill
7 months ago

In 40131:

Media: Skip PDF preview tests when image editor doens't support.

Following [40130] tests fail on environments that don't support PDF
previews because the expected meta data doesn't get written.

See #39875.

#26 @joemcgill
7 months ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 40133:

Media: Keep PDF previews from overwriting files.

Since support for PDF previews were added in [38949], it's possible
that the generated image file could overwrite an existing image file
with the same name. This uses wp_unique_filename() to avoid this
issue and adds a '-pdf' identifier on the end of filenames.

Props gitlost, desrosj, mikeschroder, joemcgill.
Merges [40130] and [40131] to the 4.7 branch.
Fixes #39875. See #31050.

Note: See TracTickets for help on using tickets.