WordPress.org

Make WordPress Core

Opened 2 years ago

Closed 4 days ago

#42437 closed defect (bug) (fixed)

Thumbnails can overwrite other uploads if filename matches

Reported by: Viper007Bond Owned by: pbiron
Milestone: 5.3.1 Priority: normal
Severity: normal Version: 4.8.3
Component: Upload Keywords: has-patch needs-testing fixed-major
Focuses: Cc:
PR Number:

Description

Imagine you're browsing some WordPress site and you find an image you want to steal you like and want to upload to your own site. Not knowing any better, you download and save a thumbnail, image-1024x768.png. Later on you upload a different image called image.png to your site. Assuming you haven't changed thumbnail sizes, the large thumbnail of the second image will overwrite the original first image.

I've attached two images that you can use to test. Notice that image-1024x768.png will become the red image instead of staying green.

One possible solution to this is to add a flag to the thumbnail generation function that toggles overwriting existing files. It should default to overwriting for backwards compatibility but not overwrite on initial upload. If not overwriting, then -2 could be added somewhere in the filename. The filename doesn't need to be predictable because it's stored in the metadata.

Attachments (6)

image-1024x768.png (3.9 KB) - added by Viper007Bond 2 years ago.
image.png (12.5 KB) - added by Viper007Bond 2 years ago.
42437.diff (1.2 KB) - added by pbiron 19 months ago.
proof-of-concept patch
42437.2.diff (5.1 KB) - added by pbiron 9 days ago.
42437.3.diff (5.7 KB) - added by azaozz 8 days ago.
42437.4.diff (6.4 KB) - added by azaozz 7 days ago.

Download all attachments as: .zip

Change History (59)

@Viper007Bond
2 years ago

#1 @Viper007Bond
2 years ago

Hmm, I didn't intend for image previews to show up.

This ticket was mentioned in Slack in #core-media by antpb. View the logs.


23 months ago

#3 follow-ups: @azaozz
23 months ago

Thinking we should be stripping things like -1024x768 from the ends of image names on upload. That will allow "proper" naming with -2, -3, etc. for duplicates and will prevent sub-sizes name conflicts.

Last edited 23 months ago by azaozz (previous) (diff)

#4 in reply to: ↑ 3 ; follow-up: @blobfolio
23 months ago

Replying to azaozz:

Thinking we should be stripping things like -1024x768 from the ends of image names on upload. That will allow "proper" naming with -2, -3, etc. for duplicates and will prevent sub-sizes name conflicts.

Having seen some absurdly long user filenames in my day, I like the idea of stripping trailing dimensions from new files before storing them.

Temporarily pop off the extension, then something like the following:
$filename = preg_replace('/(\-\d+x\d+)+$/', '', $filename);

Then make sure the filename isn't empty, add the extension back, and run the usual uniqueness filters.

That'll help things going forward, but won't stop collisions from media uploaded before the change. For that, we'd need to loop through all possible thumbnail names to check for collisions before settling on a source name.

#5 in reply to: ↑ 4 @azaozz
23 months ago

  • Milestone changed from Awaiting Review to Future Release

Replying to blobfolio:

Yep, thinking that will work for this edge case. Alternatively (as discussed in #core-media on Slack) we can have a more stringent name check that looks into the names for sub-sizes that will be generated from the uploaded image. It would add -1, -2, etc. to the original image name if a possible conflict with the names of the sub-sizes is detected.

This ticket was mentioned in Slack in #core-media by nosilver4u. View the logs.


23 months ago

#7 @desrosj
22 months ago

#43246 was marked as a duplicate.

This ticket was mentioned in Slack in #core-media by desrosj. View the logs.


22 months ago

This ticket was mentioned in Slack in #core-media by desrosj. View the logs.


22 months ago

#10 @desrosj
19 months ago

#44095 was marked as a duplicate.

#11 in reply to: ↑ 3 ; follow-ups: @pbiron
19 months ago

Replying to azaozz:

Thinking we should be stripping things like -1024x768 from the ends of image names on upload. That will allow "proper" naming with -2, -3, etc. for duplicates and will prevent sub-sizes name conflicts.

While stripping dimensions from filenames on upload would probably be the easiest technical solution, doing so might cause UX disconnects for some users.

For example, in the art world, it's quite common for artists to put the dimensions of their work into their filenames, e.g., a painter does an oil painting of a sunset on a 150cmx150cm canvas, takes a photo of the artwork and names it sunset-150x150.jpg (where, in their mind, the -150x150 suffix is the canvas size and not the dimensions of the photo of that painting). They upload this photo. Sometime later they go to /wp-admin/uploads.php and try to search for the photo, but they can't remember exactly what they named the file...was it sunset-150x150.jpg, sundown-150x150.jpg or here-comes-the-night-150x150.jpg (and nor what the caption was)? All they remember is that they put -150x150 in the filename and when they search for 150x150, they get the dreaded "No media files found." result...and think their image has somehow gotten deleted.

note: this example is based on the one in #44095, which I opened before being directed to this ticket.

Granted, that UX issue isn't a show stopper, but it's something to consider.

#12 in reply to: ↑ 11 @pbiron
19 months ago

I bumped into the problem behind this ticket yesterday while debugging a func I wrote to hook into attachment_url_to_postid, in an attempt to get around the problem of attachment_url_to_postid() not finding the attachment ID if the URL is for an intermediate size of the upload (I starting writing that func since there has been no movement on #41816, my attempt to fix that problem...which I opened 8 mos ago).

In particular, after uploading foo.jpg, I did

function my_attachment_url_to_postid_filter( $url ) {
    if ( $post_id ) {
        // attachment already found, so return it
        return $post_id;
    }

    // the following is simplified from my actual func, for clarity

    // strip any dimensions from the URL and try attachment_url_to_postid() again
    $modified_url = preg_replace( '/-\d+-\d+(?=\..*$)'/, '', $url );

    remove_filter( 'attachment_url_to_postid', __FUNCTION__, 10 );
    $post_Id = attachment_url_to_postid( $modified_url );
    add_filter( 'attachment_url_to_postid', __FUNCTION__, 10, 2 );

    return $post_id;
}
add_filter( 'attachment_url_to_postid', 'my_attachment_url_to_postid_filter', 10, 2 );

$id = attachment_url_to_postid( 'http://host/wp-content/uploads/foo.jpg' );
$id_for_intermediate = attachment_url_to_postid( 'http://host/wp-content/uploads/foo-150x150.jpg' );
// $id should now be === $id_for_intermediate

only to find out that $id_for_intermediate was the ID for a previously uploaded file named foo-150x150.jpg. That is, the if ( $post_id ) {} test at the top of my filter func was succeeding because there was an unknown to me attachment with _wp_attached_file postmeta with meta_value = 'foo-150x150.jpg') :-(

So, in addition to the serious data loss issue of previous uploads being overwritten identified in this ticket, the name collision problems between original image filenames containing "dimension"-like sub-strings and generated intermediate size filenames can have other undesirable "downstream" effects.

Last edited 19 months ago by pbiron (previous) (diff)

#13 in reply to: ↑ 3 @pbiron
19 months ago

Replying to azaozz:

Thinking we should be stripping things like -1024x768 from the ends of image names on upload. That will allow "proper" naming with -2, -3, etc. for duplicates and will prevent sub-sizes name conflicts.

Looking back through the Slack thread, I see this comment by @desrosj

I would not be in favor of this. While WordPress may try to use that filename, that could be a very purposeful filename.

with which I agree.

Another example of a "purposeful" filename that contains a "dimension"-like sub-string that is completely unrelated to the image dimensions is our-store-is-open-24x7.jpg :-)

#14 follow-up: @joemcgill
19 months ago

The best way I can think of avoiding these kinds of conflicts is to add a short hash to filenames of intermediate image sizes to ensure that this is unique. The main question would be whether it would be whether we needed to build that hash from something that is unchanging (e.g. the original file name) rather than something that could change (e.g., current server time) so that regenerating images would result in consistent image file names as ones already used in post content.

#15 in reply to: ↑ 14 @pbiron
19 months ago

Replying to joemcgill:

The best way I can think of avoiding these kinds of conflicts is to add a short hash to filenames of intermediate image sizes to ensure that this is unique.

How would such a hash be any better than the -1, -2, etc that wp_unique_filename() adds?

[just seeking clarification, not criticizing the idea]

#16 in reply to: ↑ 11 ; follow-up: @azaozz
19 months ago

Replying to pbiron:

...doing so might cause UX disconnects for some users.

Right, and generally there is no point in fixing one rare edge case and possibly introducing another.

The best way I can think of avoiding these kinds of conflicts is to add a short hash to filenames of intermediate image sizes

We already make the file names of intermediate image sizes longer. As far as I remember there was a problem with exceeding the allowed file name length for certain filesystems. Adding another (possibly long) string to the file names may trigger that.

How about my other idea: if the original file name ends with something like -600x800.jpg rename it and append -1, -2, etc. the same way we do for all other naming conflicts in WP.

The (basic) code to implement this would be:

if ( preg_match( '/.+-\n+x\n+$/', $file_name ) ) {
  // Always append `-1` to file names that can potentially match other subsize file names.
  $i = 1;
  $full_name = $file_name . "-$i" . $file_extension;

  while ( is_file( $uploads_dir . $full_name ) ) {
    $i++;
    $full_name = $file_name . "-$i" . $file_extension;
  }
}

This will give us file names like sunset-150x150-1.jpg, sundown-150x150-2.jpg or here-comes-the-night-150x150-1.jpg (the same names you will get if you upload the same image several times). These names are a lot more "readable" and close to the original than something like sunset-27624fea676d86f7-150x150.jpg for subsizes.

Last edited 19 months ago by azaozz (previous) (diff)

#17 in reply to: ↑ 16 @pbiron
19 months ago

Replying to azaozz:

How about my other idea: if the original file name ends with something like -600x800.jpg rename it and append -1, -2, etc. the same way we do for all other naming conflicts in WP.

This will give us file names like sunset-150x150-1.jpg, sundown-150x150-2.jpg or here-comes-the-night-150x150-1.jpg (the same names you will get if you upload the same image several times). These names are a lot more "readable" and close to the original than something like sunset-27624fea676d86f7-150x150.jpg for subsizes.

I just did a few tests on pathological cases with this idea implemented and it seems to work great. Look for a proof-of-concept patch shortly.

@pbiron
19 months ago

proof-of-concept patch

#18 follow-up: @blobfolio
19 months ago

Unfortunately new source attachments with a dimension-like suffix naming structures are only one small part of the issue. The full scope of the problem is that any thumbnail for an attachment might collide with another file in the uploads folder.

For example, say a site user uploaded an image called image-123x123.jpg a year ago. (No patch can fix the past. Haha.) Then today, they decide to upload a file called image.jpg. If we are only looking at the source file, no collision would be detected, however once WP crunches out the 123x123 thumbnail size, a collision will happen (the original file from a year ago will be blown away).

Beyond suffixes, we also have to deal with the new thumbnails WordPress is generating for non-image attachments like PDFs, as well as all the other unregistered files that end up in a site's uploads folder (manually edited thumbnails, files from plugins, etc.)

For a more complete fix, WordPress needs to pre-generate a list of possible thumbnail paths and run its collision checks in a loop. If any of the proposed files collide, -1 the source name, pre-generate a new list, and check again. Rinse and repeat until all 5 million files are proved unique. Haha.

#19 @pbiron
19 months ago

The patch I just added is just "proof-of-concept", and is clearly not ready for merge!

It implementis the idea in @azaozz's comment.

Additionally, it also appends -1, etc to original filenames that no not contain dimension-like sub-strings if their intermediate size filenames would conflict with original filenames that did contain such sub-strings that were uploaded prior to the patch being applied. I have no idea how to write unit tests for that part of it! There's got to be a better way of implementing this part than using glob() but I couldn't think of it off the top of my head.

Unfortunately, my dev environment is incapable of generating thumbnails for PDF uploads, so I can't test whether the patch causes problems with the fix in #39875.

I'd also note that with this patch applied my attachment_url_to_postid filter correctly resolves attachment IDs for URLs for intermediate sized images (provided the attachment was uploaded after the patch is applied).

#20 in reply to: ↑ 18 @pbiron
19 months ago

Replying to blobfolio:

For example, say a site user uploaded an image called image-123x123.jpg a year ago. (No patch can fix the past. Haha.) Then today, they decide to upload a file called image.jpg. If we are only looking at the source file, no collision would be detected, however once WP crunches out the 123x123 thumbnail size, a collision will happen (the original file from a year ago will be blown away).

Actually, the patch I just added does handle this case...although, as I mentioned, the way it does so is sub-optimal.

#21 follow-up: @joemcgill
19 months ago

How would such a hash be any better than the -1, -2, etc that wp_unique_filename() adds?

For intermediate image file names, I think it makes sense to try to keep the names for all images in a set consistent. In this case, the issue is that one of the intermediate size names generated by WP is conflicting with the name of an original file that was uploaded to the server. We could append a -1 only to the intermediate size that has the conflict, but then you end up with a set of images like:

  • filename.jpg
  • filename-150x150.jpg
  • filename-300x400-1.jpg
  • filename-600x800.jpg
  • etc.

If we wanted consistent names, we would first need to loop through all names that are going to be created and check if any of those files already exist on the server before we generate the intermediate image files.

How about my other idea: if the original file name ends with something like -600x800.jpg rename it and append -1, -2, etc. the same way we do for all other naming conflicts in WP.

My main concern with this approach is that there are too many unpredictable reasons people might have for intentionally naming files which match this pattern, and renaming files, as opposed to appending a -1, -2, etc. seems more destructive.

I recommend a hash (or similar) approach for intermediate image file names because it would fulfill the following requirements:

  1. Modify intermediate image file names rather than original file names to avoid collisions, since those files are generated by WP.
  2. Ensure originality without the need to calculate and check for collisions during the process of generating intermediate images (a process that is already long-running and resource intensive on shared hosting, which can cause failures. See #40439).
  3. Keep names of all intermediate image sizes in a set consistent so we can use pattern matching to detect the difference between an image that is part of an autogenerated set and ones that have been manually cropped in the WP media editor (important when calculating sources for srcset).
  4. Use a naming convention that is predictable so the same file names will be created when someone regenerates image sizes as would be created during the original upload process.

It's also worth noting that WordPress already uses hashes in filenames if someone has the IMAGE_EDIT_OVERWRITE constant set to false. See wp_save_image().

#22 in reply to: ↑ 21 @azaozz
19 months ago

Replying to joemcgill:

We could append a -1 only to the intermediate size that has the conflict, but then

Yeah, don't think this is a good idea. All sub-sizes should be named consistently.

If we wanted consistent names, we would first need to loop through all names that are going to be created and check if any of those files already exist on the server before we generate the intermediate image files.

How about my other idea: if the original file name ends with something like -600x800.jpg rename it and append -1, -2, etc. the same way we do for all other naming conflicts in WP.

My main concern with this approach is that there are too many unpredictable reasons people might have for intentionally naming files which match this pattern, and renaming files, as opposed to appending a -1, -2, etc. seems more destructive.

Sorry but I'm not sure I understand what you wanted to say here:

  • "intentionally naming files which match this pattern". Assuming you mean the filemane-600x800.jpg pattern.
  • "and renaming files, as opposed to appending a -1, -2, etc. seems more destructive". What seems more destructive? Appending -1, -2, etc. is renaming?

I recommend a hash (or similar) approach for intermediate image file names because it would fulfill the following requirements:

  1. Modify intermediate image file names rather than original file names to avoid collisions, since those files are generated by WP.

Few things:

  • We already modify the original file name when it collides with previously uploaded file. Try uploading the same image few times and watch your uploads dir.
  • The idea to append -1, -2, etc. to the file name extends that original functionality. Introducing yet another "naming convention" to handle pretty similar circumstances seems like not so good.
  • Adding a hash to the sub-size names makes pretty ugly URLs. I'm not sure what the final impact may be but readability of URLs is a major consideration for all sites, not only for SEO.
  1. Ensure originality without the need to calculate and check for collisions during the process of generating intermediate images (a process that is already long-running and resource intensive on shared hosting, which can cause failures. See #40439).

Right. No matter how we modify these names there will always be a tiny chance of name collision. We may minimize it by adding hashes, but it can never be 0, there will always be something like 0.00001%. In WP terms (tens of millions of installs) this is not such a small number :)

  1. Keep names of all intermediate image sizes in a set consistent so we can use pattern matching to detect the difference between an image that is part of an autogenerated set and ones that have been manually cropped in the WP media editor (important when calculating sources for srcset).

Using pattern matching is actually easier done when the original filename matches most of the sub-size filename, like it is at the moment. Adding a hash makes that harder/slower.

It's also worth noting that WordPress already uses hashes in filenames if someone has the IMAGE_EDIT_OVERWRITE constant set to false. See wp_save_image().

That's not exactly the case. When an image is edited, a new image is created that has a hash in the filename, then all the sub-sizes are created from that new image. If we add another hash for sub-sizes, that will mean edited images sub-sizes will have double hashes, i.e. their filenames will be something like 20170729_142638-02332t348y5767374-e1526551747486-206x300.jpg :)

#23 @azaozz
19 months ago

Ugh, sorry about the long post :)

The tl;dr version:

  • If we don't check for collisions for all file names when uploading a file (original file name plus all sub-sizes file names), we will never eliminate these collisions 100%.
  • Adding hashes to all sub-size file names makes for ugly URLs to them. "Pretty" URLs are quite important for readability and SEO.

This ticket was mentioned in Slack in #core-media by pbiron. View the logs.


10 months ago

This ticket was mentioned in Slack in #core-media by pbiron. View the logs.


7 months ago

#26 @azaozz
6 weeks ago

#48458 was marked as a duplicate.

#27 @azaozz
6 weeks ago

  • Milestone changed from Future Release to 5.3.1

After #47873 there are more possibilities for file name collisions, see #48458. Lets try to fix this in 5.3.1.

This ticket was mentioned in Slack in #core by audrasjb. View the logs.


3 weeks ago

#29 @pbiron
3 weeks ago

  • Owner set to pbiron
  • Status changed from new to assigned

#30 follow-up: @pbiron
3 weeks ago

@azaozz I'm working on a revised patch for this.

Where do you think is the best place to do this "extra" collision detection? My original patch does it in _wp_handle_upload(), but I'm wondering whether it would be better to do it in wp_unique_filename().

The question is: wp_unique_filename() is called in a few places where it's a little hard for me to wrap my head around whether there would be unintended consequences if it where there. For example, wp_crop_image(), wp_generate_attachment_metadata(), _copy_image_file(), wp_tempnam().

#31 in reply to: ↑ 30 @azaozz
11 days ago

Replying to pbiron:

@azaozz I'm working on a revised patch for this.

Where do you think is the best place to do this "extra" collision detection?

Been thinking about this for a while. Not sure it can use glob() as the PHP manual warns that:

Note: This function will not work on remote files as the file to be examined must be accessible via the server's filesystem.

Can try to generate a list of "reserved parts" of image file names. For that it will need access to the actual file, then it can use getimagesize( $file ) to get the image dimensions, wp_get_registered_image_subsizes(), and image_resize_dimensions() to get the exact 123x456 part. Then we can still use file_exists() or is_file() that support remote files.

Looks like that will work, but may be a bit fragile? May also result in some false positives.

I like the idea @joemcgill described in comment 21. But appending a hash only to sub-sizes will prevent the possibility to "get" the attachment ID from an image URL, see #48453. Perhaps a short hash (6-8 letters/numbers) could be added to the original file name instead of the -1, -2, etc?

For where/how to do this, there seem to be couple of possibilities:

  1. Use the existing mechanism to pass $unique_filename_callback to wp_unique_filename(). This will work for the default uploading functions, for example media_handle_upload(), but will require some changes to them (5-6 places as far as I see). The downside is that it will not work for plugins using the same functions.
  1. Include it in wp_unique_filename(). This may result in "false positives" when generating file names, but generally will have better backwards compatibility.

This ticket was mentioned in Slack in #core by audrasjb. View the logs.


9 days ago

#33 @audrasjb
9 days ago

Hi,

Moving it to next major as per today's bug scrub to give it more time. Feel free to move it back to 5.3.1 if we have a tested patch in the next few days.

#34 @audrasjb
9 days ago

Update: after discussion in #core slack channel, let's give this ticket some days to see if it can land in 5.3.1 :)

@pbiron
9 days ago

#35 follow-up: @pbiron
9 days ago

42437.2.diff updates my old POC patch.

  1. all the new collision detection is in wp_unique_filename() (instead of in _wp_handle_upload() as in the POC)
  2. replaces the glob() from the POC with an array_filter() on the results of scandir(). This is for 2 reasons: 1) glob isn't expressive enough (not full regex) and 2) as @azaozz pointed out, glob doesn't doesn't do remote files, e.g., CDNs, but scandir() does)
  3. refactors wp_unique_filename() so that there is only 1 return point (and one apply_filters( 'wp_unique_filename')). This made it easier to do the final collision check (number 3 below)
  4. adds a simple unit test (luckily, there is already an image in the test data with a "dimension-like" filename)

A few notes:

The revised patch:

  1. adds a number to any file that could cause a collision with sub-sizes
  2. Then, it does the same collision detection that has always existed.
  3. An finally, it checks collisions with existing files (e.g., sub-size files uploaded before the patch was applied). That last check addresses @blobfolio's comment.

My thinking is that it is best to bracket the existing collision detection checks with the new ones. Curious what others think...

I locally ran all of the existing unit tests that directly call wp_unique_filename() and they all still pass. I have not run all of the existing unit tests, but I figure that if the ones that directly call wp_unique_filename() aren't enough to catch problems then we should probably add some more direct-call tests :-)

To test (without running the unit tests), be sure you upload at 1 file that has a "dimension-like" filename before applying the patch. For example, upload image-150x150.jpg. Then apply the patch. Then upload image.jpg, which should get renamed to image-1.jpg before sub-sizes are created. The upload another-image-150x150.jpg, which should get renamed to another-image-150x150-1.jpg before sub-sizes are created.

#36 in reply to: ↑ 35 ; follow-up: @azaozz
8 days ago

Replying to pbiron:

Thanks for the updated patch. It looks quite better but still needs some considerations. Working on it now :)

The revised patch:

  1. Adds a number to any file that could cause a collision with sub-sizes.
  2. Then, it does the same collision detection that has always existed.
  3. An finally, it checks collisions with existing files (e.g., sub-size files uploaded before the patch was applied). That last check addresses @blobfolio's comment.

Sounds good. I'm still a bit unsure about 1, but it seems like a good "preventive" mechanism. It means that when an image like picture-scaled.jpg is uploaded, it will be renamed to picture-scaled-1.jpg. Then when generating the sub-sizes there will be picture-scaled-1-scaled.jpg, picture-scaled-1-300x200.jpg, picture-scaled-1-1024x768.jpg, etc.

Also perhaps replace uses of array_filter() with a callback with a simple foreach? The difference is array_filter() will run through the whole array no matter what, then return the filtered array. A callback can return as soon as a match is found. Will try to test this with different array lengths to see which is better/faster.

#37 in reply to: ↑ 36 ; follow-up: @pbiron
8 days ago

Replying to azaozz:

Replying to pbiron:

The revised patch:

  1. Adds a number to any file that could cause a collision with sub-sizes.
  2. Then, it does the same collision detection that has always existed.
  3. An finally, it checks collisions with existing files (e.g., sub-size files uploaded before the patch was applied). That last check addresses @blobfolio's comment.

Sounds good. I'm still a bit unsure about 1, but it seems like a good "preventive" mechanism. It means that when an image like picture-scaled.jpg is uploaded, it will be renamed to picture-scaled-1.jpg. Then when generating the sub-sizes there will be picture-scaled-1-scaled.jpg, picture-scaled-1-300x200.jpg, picture-scaled-1-1024x768.jpg, etc.

1 was your idea :-)

Also perhaps replace uses of array_filter() with a callback with a simple foreach? The difference is array_filter() will run through the whole array no matter what, then return the filtered array. A callback can return as soon as a match is found. Will try to test this with different array lengths to see which is better/faster.

Not sure what you mean by "callback" in this context. But, yes, could probably be sped up...especially when get_option( 'uploads_use_yearmonth_folders' ) returns false.

#38 in reply to: ↑ 37 @azaozz
8 days ago

Replying to pbiron:

1 was your idea :-)

Hehe, yes, just thinking more about it now :)

Not sure what you mean by "callback" in this context. But, yes, could probably be sped up...especially when get_option( 'uploads_use_yearmonth_folders' ) returns false.

Yeah, did some quick speed testing and array_filter() is slightly faster when there are less than 200-250 files in the uploads dir. For more files the foreach loop starts to get a bit faster especially if a match is found near the top. Generally the difference is negligible, the "slowness" there comes from the callback function/regex.

@azaozz
8 days ago

#39 follow-up: @azaozz
8 days ago

In 42437.3.diff:

  • Based on 42437.2.diff.
  • Remove the case 1 above "Adds a number to any file that could cause a collision with sub-sizes.". Can cause "false-positives" and perhaps better to not rename the file unless we have to.
  • Ensure the $number is reused across different tests so it doesn't end up with name like file-1-2.ext.
  • Use a helper function to test for possible collisions.
  • Some code improvements.

#40 @azaozz
8 days ago

  • Keywords has-patch needs-testing needs-unit-tests added; needs-patch removed

Uh, forgot to mark the helper function as private... Needs @private in the docblock before committing.

Would be great to add some more unit tests.

#41 in reply to: ↑ 39 @pbiron
8 days ago

Replying to azaozz:

In 42437.3.diff:

  • Based on 42437.2.diff.
  • Remove the case 1 above "Adds a number to any file that could cause a collision with sub-sizes.". Can cause "false-positives" and perhaps better to not rename the file unless we have to.
  • Ensure the $number is reused across different tests so it doesn't end up with name like file-1-2.ext.
  • Use a helper function to test for possible collisions.
  • Some code improvements.

I'll try to give it some testing tomorrow.

One thing: with "case 1" removed, the 1st assertEquals() in the unit test needs to be removed, because it will fail.

#42 follow-up: @pbiron
7 days ago

After a little bit of testing, it seems clear we have to do something for case 1, whether the "append a number" solution or something else.

To see this, after applying 42437.3.diff do the following:

  1. verify that the size of "Thumbnail" to 150x150 (in Settings > Media)
  2. upload image.jpg (that is larger than 150x150), this will create the sub-size image-150x150.jpg
  3. upload image-140x140.jpg (this will create the sub-size image-140x140-150x150.jpg
  4. change the size of "Thumbnail" to 140x140
  5. run wp media regenerate (or use the Regenerate Thumbnails or another plugin to do the same)

After step 5, you'll see that the original of image-140x140.jpg has been overwritten by the thumbnail sub-size of image.jpg.

#43 in reply to: ↑ 42 ; follow-up: @azaozz
7 days ago

Replying to pbiron:
I agree. This is a "far fetched" edge case but since we're trying to fix "all" cases, perhaps good to account for it too :)

After step 5, you'll see that the original of image-140x140.jpg has been overwritten by the thumbnail sub-size of image.jpg.

Yeah, the logic behind "Add a number to any file that could cause a collision with sub-sizes." is to rename files when the original name ends with a "WP reserved" string. Perhaps we can look into doing this only for supported image files, i.e. .png, .gif, .jpg, .jpeg and/or "formalize" that list of reserved file name endings.

For 5.3.1 thinking it would be enough to hard-code this. Revised patch coming up.

BTW, the unit tests seem to pass here with 42437.3.diff. Lets look at adding some more :)

Last edited 7 days ago by azaozz (previous) (diff)

#44 in reply to: ↑ 43 ; follow-up: @pbiron
7 days ago

Replying to azaozz:

Replying to pbiron:
I agree. This is a "far fetched" edge case but since we're trying to fix "all" cases, perhaps good to account for it too :)

"far fetched", maybe; "real world", definitely! It's actually the case that brought this ticket to my attention last year. An artist uploaded an image with a "dimension-like" filename to a site...no conflicts at that time. I later changed themes, the new theme changed sub-sizes so I regenerated thumbnails and found the artist's original image had been overritten.

BTW, the unit tests seem to pass here with 42437.3.diff. Lets look at adding some more :)

What's the process for addng new images to tests/phpunit/data/images? Not sure how to do that with "regular" a patch. For more unit tests we'll need some additional test images to test against.

Last edited 7 days ago by pbiron (previous) (diff)

@azaozz
7 days ago

#45 follow-up: @azaozz
7 days ago

In 42437.4.diff: same as 42437.3.diff but brings back adding a number to the file name when it matches the pattern or "reserved" file name endings.

#46 in reply to: ↑ 44 @azaozz
7 days ago

Replying to pbiron:

I later changed themes, the new theme changed sub-sizes so I regenerated thumbnails and found the artist's original image had been overritten.

Yeah, makes sense.

What's the process for addng new images to tests/phpunit/data/images? Not sure how to do that with "regular" a patch. For more unit tests we'll need some additional test images to test against.

There's no good way to add "binary" files to diffs. If you upload the images here, I can add them separately when committing the tests (images should be copyright free/in "public domain").

Last edited 7 days ago by azaozz (previous) (diff)

#47 in reply to: ↑ 45 ; follow-up: @pbiron
7 days ago

Replying to azaozz:

In 42437.4.diff: same as 42437.3.diff but brings back adding a number to the file name when it matches the pattern or "reserved" file name endings.

This seems to work great on all the tests I've run. Will try to run some more tomorrow morning.

Replying to azaozz:

There's no good way to add "binary" files to diffs. If you upload the images here, I can add them separately when committing the tests (images should be copyright free/in "public domain").

I was thinking of the color swatches that Alex already attached to this ticket (and maybe a few more like them). But I won't have time to try to write additional tests until the weekend.

#48 in reply to: ↑ 47 @azaozz
6 days ago

Replying to pbiron:

This seems to work great on all the tests I've run. Will try to run some more tomorrow morning.

Great! :)

I was thinking of the color swatches that Alex already attached to this ticket (and maybe a few more like them).

On the other hand we are only testing the file names, thinking we don't need new image files, can "replicate" an existing image in the test dir and rename it appropriately.

Looking further, seems the tests in 42437.2.diff cover it pretty well. Just need to extend the second to cover cases where an image with identical name was uploaded, i.e. have one-blue-pixel-100x100.png and one-blue-pixel-1-100x100.png.

Thinking this is ready for trunk. @pbiron if you see other possible tests, please add them in a separate patch :)

Last edited 6 days ago by azaozz (previous) (diff)

#49 @azaozz
6 days ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 46822:

Upload: fix wp_unique_filename() to prevent name collisions with existing or future image sub-size file names, and add unit tests.

Props Viper007Bond, pbiron, azaozz.
Fixes #42437.

#50 @azaozz
6 days ago

  • Keywords needs-unit-tests removed
  • Resolution fixed deleted
  • Status changed from closed to reopened

Reopen for merging to the 5.3.1 branch.

#51 @azaozz
6 days ago

  • Keywords fixed-major added

#52 @SergeyBiryukov
4 days ago

In 46835:

Tests: Change group annotation for test_unique_filename_with_dimension_like_filename() to `ticket.

See #42437.

#53 @SergeyBiryukov
4 days ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 46836:

Upload: fix wp_unique_filename() to prevent name collisions with existing or future image sub-size file names, and add unit tests.

Props Viper007Bond, pbiron, azaozz.
Merges [46822], [46835] to the 5.3 branch.
Fixes #42437.

Note: See TracTickets for help on using tickets.