WordPress.org

Make WordPress Core

Opened 19 months ago

Last modified 17 months ago

#36822 accepted defect (bug)

srcset is added to <img /> even if media file is not present in media library

Reported by: temmokan Owned by: joemcgill
Milestone: Future Release Priority: normal
Severity: normal Version: 4.4.1
Component: Media Keywords: close
Focuses: Cc:

Description

Summary: if link to image ends with string like

"YYYY/MM/filename.ext"

and there are image files in MEdia library residing in

/wp-content/upload/YYYY/MM

then some of those files will be used to generate "responsive image" feature, adding srcset/sizes attributes

Example 1: On Wordpress site http://example.com, document contains markup like

<img src="/wp-content/uploads/YYYY/MM/filename.png">

where the above file is *not* part of Media library (it just resides in that folder). srcset is added with link to (arbitrary selected?) image file within that folder.

Expected behaviour: no srcset/sizes should be added.

Example 2: On Wordpress site http://example.com, document contains markup like

<img src="http://cdn.example.com/wp-content/uploads/YYYY/MM/filename.png">

where cdn.example.com is *not* an alias for example.com, but altogether external resource

Once again, srcset is added with link to (arbitrary selected?) image file within folder YYYY/MM in Media library.

Expected behaviour: no srcset/sizes should be added.

Attachments (1)

36822.diff (947 bytes) - added by joemcgill 18 months ago.
Match full file paths of image src attributes

Download all attachments as: .zip

Change History (13)

#1 @joemcgill
19 months ago

  • Keywords changed from responsive, media to responsive media
  • Owner set to joemcgill
  • Status changed from new to reviewing

#2 @temmokan
19 months ago

Sample URL to illustrate the bug:

http://app-sandbox.com/wordpress/2016/05/13/image-to-post.html

I can give any one from developers access to the above test site, so they could check the contents of the document source. Wordpress 4.5.2, vanilla, fresh install

Thanks.

#3 @joemcgill
18 months ago

HI @temmokan,

Thanks for the report. The srcset and sizes attributes are added on the front end if the image contains a wp-image-{{id}} class, based on the attachment metadata. It's not immediately clear to me based on your example what might be happening here. Could you outline the steps you've going through to reproduce this issue? In other words, how are those images being added to the posts if they're not in the media library, and how are you changing the src attribute to point to an external source?

Thanks.

#4 @temmokan
18 months ago

The lowest image has been added from Media library; its HTML is

<img src="http://app-sandbox.com/wordpress/wp-content/uploads/2016/05/pic321.png" alt="pic321" width="300" height="225" class="alignnone size-medium wp-image-4" />

Other two have been added by selecting/copy-pasting the above, with "src" attribute replaced with the one for other two images. Note: I usually use HTML mode of editing instead of rich editor.

As far as I see, the code processing media files should detect whether it's allowed to insert source set attributes, to avoid the results such as shown.

#5 @joemcgill
18 months ago

  • Keywords media removed
  • Milestone changed from Awaiting Review to 4.6
  • Status changed from reviewing to accepted
  • Version changed from 4.5.2 to 4.4.1

Thanks for the explanation @temmokan. I've been able to confirm this behavior on my end. Looks like this is a result of the src matching pattern we added in r36121 being too permissive. The relevant code is here:

// If the file name is part of the `src`, we've confirmed a match. 
if ( ! $src_matched && false !== strpos( $image_src, $dirname . $image['file'] ) ) { 
	$src_matched = true; 
}

This is matching a src with the metadata associated with the id found in the markup as long as the filenames match and the uploads directory structure is found somewhere in the path. We may need to check the full file path and not just the uploads directory path (i.e. the year/month part) in order to prevent false positives when the file has been moved out of the normal uploads directory location. However, we also need to be mindful about the need to support CDNs as outlined in #35102.

@joemcgill
18 months ago

Match full file paths of image src attributes

#6 @joemcgill
18 months ago

  • Keywords has-patch needs-unit-tests added

36822.diff tests whether the full file path of the src attribute matches the path of the sources that would be returned in a srcset, which will prevent false positives when the src has been manually moved outside the uploads directory, but will retain the ability for the hostname to have been changed (e.g. CDN offloads).

@temmokan this should fix your first example, but your second example would still return a srcset so we don't break responsive images for people who have offloaded their images to a CDN while retaining the file structure.

#7 @temmokan
18 months ago

@joemcgill Thanks for the patch code.

Regarding CDN compatibility: is file structure expected to be

<CDN URL part>/uploads/YYYY/MM/filename.ext

or

<CDN URL part>/YYYY/MM/filename.ext

If the former, there will be less false srcset's added. If the second, more general, there will be more. Perhaps */uploads/* should be expected in CDN URLs?

Note that if "pull" type of CDN is used, then uploads/*, or even wp-content/uploads/* will be present, most probably.

Update: perhaps the mentioned image files tags processing should be explicitly documented somewhere, to make it feature, not bug, in certain circumstances (such as when using CDN retaining entire wp-content/uploads/* structure)

Last edited 18 months ago by temmokan (previous) (diff)

This ticket was mentioned in Slack in #core-images by joemcgill. View the logs.


18 months ago

#9 @azaozz
18 months ago

There are three requirements for srcset and sizes to be added to an image tag:

  • There should be a class name with the attachment post id in it. Example: wp-image-4 means attachment id 4.
  • That id should be of an image attachment post (not page, not revision, etc.).
  • There should be image meta for that attachment post containing information about the sub-sizes that were generated when the image was uploaded.

The wp-image-### class is only added when images are inserted from the media library. If the image tag is copied and then the src is changed, the class(es) on it would probably need changing or removing too.

I also don't understand:

<img src="/wp-content/uploads/YYYY/MM/filename.png"> where the above file is *not* part of Media library

What exactly is "not part of the media library"? There is no attachment post for it? Or there is no image meta? Or sub-sizes? Or all three? And why is that file in the media library directory?

If any of the above, why the filename matches a file that is a part of the media library?

Confusing :)

In any case don't think this is a typical case that needs to be supported. The patch can break some CDNs that are working now.

Perhaps we can add a filter on the filename matching so such "extreme" edge cases can be dealt with by a simple plugin? If I remember correctly we were discussing a filter there. At the end we decided against it as there weren't any reported problems that would have been solved by adding such filter.

#10 @temmokan
18 months ago

What exactly is "not part of the media library"?

File has not been added to media library via corresponding Wordpress admin interface.

In any case don't think this is a typical case that needs to be supported.

In other words, there will be code that will generate invalid markup in case something "unusual" is encountered.

The patch can break some CDNs that are working now.

Checkbox settings like "CDN is used to cache images" could be used, checked by default, to allow stricter behaviour (by unchecking it).

IMHO, "do not assume anything" should also be applied to Wordpress programming. Any assumption not explicitly documented can cause many hard to track/handle problems.

That is, please document the exact behaviour as feature in case you do not wish to handle all "unusual" cases correctly.

Thanks.

Last edited 18 months ago by temmokan (previous) (diff)

#11 @A5hleyRich
17 months ago

I have to agree with @azaozz, this seems very edge case.

@temmokan we can't assume that the file path will follow the same format. For example, it's common for CDNs to add an object prefix, resulting in paths like so:

wp-content/uploads/2016/07/14140907/image.jpeg

#12 @joemcgill
17 months ago

  • Keywords close added; responsive has-patch needs-unit-tests removed
  • Milestone changed from 4.6 to Future Release

After confirming that the current approach in 36822.diff would indeed break some CDN implementations by being more strict about path matching, I'm leaning towards closing this as a 'wontfix', unless we can come up with a better plan for detecting and avoiding false positives like the ones reported.

The problem here is that there is a valid image in the media library with the same filename as the attachment id referenced in the class attribute of the image. We could probably make the assumptions more clear with a note in the documentation, but it's not immediately clear what else could be done here.

Note: See TracTickets for help on using tickets.