Opened 5 years ago
Closed 5 years ago
#48453 closed defect (bug) (fixed)
Regression: Implied contract between image sub-size filenames and their base filename now broken
Reported by: | ianmjones | Owned by: | azaozz |
---|---|---|---|
Milestone: | 5.3 | Priority: | normal |
Severity: | major | Version: | 5.3 |
Component: | Media | Keywords: | 2nd-opinion has-patch dev-reviewed |
Focuses: | Cc: |
Description
Since the introduction of big image scaling and automatic rotation of images in #47873 and further refined in #48304, the base file name for an image stored in the _wp_attached_file
postmeta value may no longer directly relate to the filenames of image sub-sizes held in the _wp_attachment_metadata
postmeta value's sizes
array.
This now means the local URL for a thumbnail might be:
https://example.com/wp-content/uploads/2019/10/image-150x150.jpg
But it's base URL might be any of the following:
https://example.com/wp-content/uploads/2019/10/image-scaled-2560.jpg
https://example.com/wp-content/uploads/2019/10/image-rotated.jpg
https://example.com/wp-content/uploads/2019/10/image.jpg
That image-scaled-2560.jpg
filename is dependent on a filterable threshold, and could change depending on themes or plugins that are installed and any criteria that might apply.
Previously there was an implied contract between the filenames for elements of the sizes
array, they were the base file name with -WWWWxHHHH
as the suffix depending on their width (WWWW, usually 3 to 4 numerics), and height (HHHH, usually 3 to 4 numerics).
This meant when a plugin such as the many many page builders and other custom content builders added an image to their content and did not use the standard functions to do so that generate an img
tag with wp-image-NNN
class name that could be used to find the attachment ID, you could at least parse the URL's path element, strip off the consistently formatted size information, and then find the attachment ID via its base path/filename.
This is no longer possible as the base file name for scaled or rotated images that corresponds to the sizes is now within the original_image
meta field within the _wp_attachment_metadata
value.
To find the attachment ID for a local image URL you could previously do a relatively fast SQL select based on the meta_key = '_wp_attached_file'
and meta_value = '2019/10/image.jpg'
for the above example.
Now you're going to have to concoct some UNION on the possible alternate file names, with some fudging for alternate thresholds, or a very expensive LIKE
select on the meta_value
of the _wp_attachment_metadata
, or somehow capture the relationship between the _wp_attached_file
value and the original_image
in custom data (custom tables no doubt) so that there's some sort of fast lookup.
Proposal:
1) Instead of suffixing the base file name for images with either -scaled-NNNN
or -rotated
, leave it as per the "normal" sanitized version, but move the original file to a suffixed name such as image-original-version.jpg
and store that in original_image
. For clarity, it could be something like image-original-pre-scaling-2560.jpg
or image-original-pre-rotation.jpg
.
or
2) Ensure that the sub-sizes are generated with a matching file name. So if the "full" size is now "image-scaled-2560.jpg", there would be "image-scaled-2560-150x150.jpg" and "image-scaled-2560-300x200.jpg" etc.
As an aside, while testing I have noticed a couple of further problems related to the renaming of the base file...
1) When a large file is edited with the standard editor, the base file and sizes now have a unified image-scaled-2560-e123456789012
base, which is good. Except the twentytwenty-fullscreen
size retained the original image-1980x990.jpg
filename.
2) The change in filename during upload causes problems with the wp_unique_filename
being handled consistently by plugins.
Attachments (1)
Change History (19)
#1
follow-up:
↓ 4
@
5 years ago
- Keywords needs-patch 2nd-opinion added
- Milestone changed from Awaiting Review to 5.3
This ticket was mentioned in Slack in #core by azaozz. View the logs.
5 years ago
#3
in reply to:
↑ 2
;
follow-ups:
↓ 5
↓ 9
@
5 years ago
Replying to slackbot:
This ticket was mentioned in Slack in #core by azaozz. View the logs.
From my perspective, moving the "original" would only complicate things, and that is because currently (in 5.3-RC2 also) there is no information recorded in the database of it, no file path, no width/height, just nothing, and we rely on the directory mentioned in the metadata for the full size to identify it (or at least I could not find another way to do this). It will just become a nightmare to delete it or just use it to generate files as a failback when something went wrong with the full size.
From all my tests, the issue with the full-size naming happens only when the threshold kicks in, and the sub-sizes are not generated from the new full size generated, but from the original itself.
For example, this is quite odd for big files, let's say the uploaded file is image.jpeg
of 6MB, this is then generating with the new threshold filter a full-size called image-2560.jpeg
of 1MB (if the script does not fail). The sub-sizes are then generated from the 6MB file, processing which is again prone to script fails not to mention it will take a lot of time.
Why aren't the sub-sizes generated from the new full-size which is smaller and also the quality high enough as far as I could test?
Also, what is the point to keep the "original" when the threshold was applied? As far as I understand the process image-2560.jpeg
can be easily renamed back after the real original file and the real original removed (this is never used as far as I can see in the code). If the code would to this switch, then I do not see a problem anymore for the naming of the sub-sizes, and also, the sub-sizes will generate faster, from a smaller file.
Did I get this right?
#4
in reply to:
↑ 1
@
5 years ago
Replying to azaozz:
Been wondering if there should be a better
image_url_to_attachment_id()
function in core to help with these cases.
That would be awesome! 👍
At first look at the proposed solutions thinking that moving/renaming of the originally uploaded file is the better one.
That's my instinct too.
I expect end users would be less surprised by the filename they see everywhere being the sanitized version of what they uploaded, just as it is in WP5.2.
We also see a lot of feedback in our support related to the importance of image file names being as uploaded, people get a bit twitchy if their carefully crafted SEO friendly file names get changed. So tacking on -scaled-2560
or -rotated
might be unwelcome.
Seeing as the original_image
isn't expected to be used much, if at all, then I don't see much downside to that file being renamed rather than the base file used across WP.
If you know the attachment ID you can therefore easily get the _wp_attachment_metadata
and check whether there is an original_image
to optionally process.
And because _wp_attached_file
has the relative path to uploads
that gives the location of the plain file names in sizes
and original_image
, it's easy to locate them with dirname
.
The argument against was that renaming the original image may introduce another (very small) possibility of errors while post-processing images, while saving another image is proven to work properly at that time.
I guess this is the only real concern with renaming the original rather than full/scaled image.
Also thinking that appending only
-original
or perhaps-wp-original
to the file name should be sufficient. There are cases where it can reach length limits.
This concern with file name length possibly hitting a limit raises another good reason for renaming the original and not the scaled image.
If a large image.jpg
is uploaded with the current RC2 method, and that image is not only renamed for uniqueness but then edited, we could have something like the following...
full
: image-1-scaled-2560-e1572283267932.jpg
large
: image-1-scaled-2560-e1572283267932-1024x768.jpg
original_image
: image-1.jpg
For that large
version, we're looking at an extra 38 characters at least.
However, if instead the original gets renamed we have...
full
: image-1-e1572283267932.jpg
large
: image-1-e1572283267932-1024x768.jpg
original_image
: image-1-wp-original.jpg
That means large
is 12 characters less in length, and original_image
has still gained far less characters than large
gained for the edit and size suffixes and so must be within bounds.
#5
in reply to:
↑ 3
@
5 years ago
Replying to Iulia Cazan:
Did I get this right?
I was a bit confused by your comment, it didn't quite fit with what I'm seeing in RC2. But I hope my previous comment in reply to @azaozz helps, particularly with how easy it is to find the original_image
if you already know the attachment ID.
I personally can't speak to your comments regarding the quality difference between using the original_image
verses a scaled version for resizing, but I suspect someone in the know can help, and will likely start with "It depends ..."! 😉
#6
follow-up:
↓ 7
@
5 years ago
I'm finding it difficult to articulate my exact thoughts, but I am increasingly concerned about these changes drifting further and further from the original construction of the feature as we get so late into the RC cycle.
Further, I think it's important to clarify that the "implied contract" is not only implied, it is assumed, and could be argued that this has always been a case of "doing it wrong". While I'm sympathetic to the problems with page builders and third party plugins, I'd actually argue that as image management and subsize generation gets more complex, there is a clear case to firmly break from this implied contract, and reinforce that WordPress should be fully trusted to handle filenames, and that third party plugins should stop assuming that image URLs are a constant.
#7
in reply to:
↑ 6
@
5 years ago
Replying to chrisvanpatten:
and that third party plugins should stop assuming that image URLs are a constant.
I am not sure that the third-party plugins are a problem in this discussion. I think that the discussed changes in the naming of files are affecting a lot of areas of WordPress, SEO, etc., that's why it is concerning.
#8
@
5 years ago
I am not sure that the third-party plugins are a problem in this discussion. I think that the discussed changes in the naming of files are affecting a lot of areas of WordPress, SEO, etc., that's why it is concerning.
The entire point of the original ticket is about the compatibility problems experienced by third party plugins.
I don't think SEO should at all be a concern here. WordPress has long appended things to non-original image URLs, and that practice is not changing. If you insert a "medium" or "large" sized image (as most likely do because loading a "full" sized image directly in a block post is going to dramatically slow down your page), it will now — as ever — append dimensions. I don't see the addition of -scaled-2560
being materially different from how it works today with the 1234x1234
style suffixes.
#9
in reply to:
↑ 3
@
5 years ago
Replying to Iulia Cazan:
From my perspective, moving the "original" would only complicate things,
Yes, think so too. Another good/important reason is that WordPress always tries its best to keep content added by users unchanged. This includes uploaded files.
Also, what is the point to keep the "original" when the threshold was applied?
Same reason as above. Also it lets WP create better quality image sub-sizes in the future and also lets it be used as "media storage".
#10
@
5 years ago
After quite a bit of testing thinking that the problem reported here would be fixed best by updating/extending the DB query when looking at _wp_attached_file
meta values.
Trying to guess the attachment ID from an image URL has always been a hit-and-miss process. Ideally there should be no need for that, but... some themes and plugins are not using the WP functions and "doing-it-wrong". Adding a (somewhat standardized) function to handle this in core seems to make most sense (yeah, it means that yet again core will be fixing the "doing-it-wrong" practices in themes and plugins...).
For an image named landscape.jpg
currently the value in _wp_attached_file
may be landscape.jpg
, landscape-scaled-####.jpg
or landscape-rotated.jpg
. Thinking that removing the variable -####
bit from the file name for scaled images, and updating the DB query to do
WHERE ( meta_value='landscape.jpg' OR meta_value='landscape-scaled.jpg' OR meta_value='landscape-rotated.jpg' )
would be the most straightforward fix here.
This is fully backwards compatible and the DB query will be almost as fast as before (assuming it has LIMIT 1
).
#11
follow-up:
↓ 12
@
5 years ago
@azaozz What’s the context where this would be used in core? I’m concerned because we’ve had a ton of problems with performance on plugins that attempt to do these sorts of “URL to ID” lookups, and had to either disable the behaviour or find different plugins.
Admittedly as a large media brand we are an edge case (~15 million postmeta rows and up on our largest sites) but these types of queries always seemed to be super problematic and introduced all kinds of performance hits on uncached requests (primarily in REST API calls).
#12
in reply to:
↑ 11
@
5 years ago
Replying to chrisvanpatten:
What’s the context where this would be used in core?
This is not used in core, and hopefully never will be. But there are plugins that need it, so they implement similar code/solutions. Was thinking that having something like that in core will "standardize" the way it's done and let core fix it when needed, like in this case.
Of course this is still just an idea :) Ideally all themes and plugins would fix their code and make these cases non-existent.
At the same time if core can change something to help plugins work better, especially when adding new functionality, I'm thinking it would be good to do the change. As far as I see the only change needed in this case is to remove the variable number form scaled image file names. That would still make these file names easily recognizable, and as this is new feature, it has no back-compat issues.
This ticket was mentioned in Slack in #core by david.baumwald. View the logs.
5 years ago
#15
@
5 years ago
- Keywords has-patch dev-reviewed added; needs-patch removed
48453.diff looks good to me.
#16
@
5 years ago
- Owner set to azaozz
- Resolution set to fixed
- Status changed from new to closed
In 46658:
Moving to 5.3 for consideration.
Yes, it's always been "problematic" to find the attachment ID from an image URL, and the new functionality in 5.3 makes it even harder. It is, as mentioned, not the proper way to do things, but it was a more or less viable fallback.
Been wondering if there should be a better
image_url_to_attachment_id()
function in core to help with these cases. Not having the expectedwp-image-####
class name also means these images don't havesrcset
andsizes
attributes, or at least not the values generated in core.At first look at the proposed solutions thinking that moving/renaming of the originally uploaded file is the better one. This was also considered in the big image ticket. The argument against was that renaming the original image may introduce another (very small) possibility of errors while post-processing images, while saving another image is proven to work properly at that time. Also thinking that appending only
-original
or perhaps-wp-original
to the file name should be sufficient. There are cases where it can reach length limits.Any other thoughts/ideas?