Make WordPress Core

Opened 8 years ago

Closed 5 years ago

#39768 closed defect (bug) (fixed)

Incorrect image returned with attachment_url_to_postid()

Reported by: bengreeley's profile ben.greeley Owned by: sergeybiryukov's profile SergeyBiryukov
Milestone: 5.3.3 Priority: normal
Severity: normal Version: 4.7.2
Component: Media Keywords: has-patch commit fixed-major
Focuses: Cc:

Description

When two files are uploaded to the media library during the same month that have the same name and differ in capitalization, the function

attachment_url_to_postid()

can return the incorrect image.

For example, if two images were uploaded named untitled.jpg and Untitled.jpg, the SQL query that is being run in

attachment_url_to_postid()

isn't case-sensitive so it will use the first image that is returned, which may not be correct if the desired image is Untitled.jpg, but the image information that is returned is for untitled.jpg. The meta_value for each _wp_attached_file is saved as 2017/02/untitled.jpg and 2017/02/Untitled-1.jpg, so a query to match http://url.dev/wp-content/uploads/sites/9/2017/02/Untitled.jpg returns 2017/02/untitled.jpg, which is the incorrect image.

I created a suggested workaround, which grabs matches in case there was a duplicate filename and returns the closest case-sensitive result. https://gist.github.com/bengreeley/29a6edaca5c370091dd2cafc3369e400

Attachments (5)

39768.jpg (136.5 KB) - added by vsamoletov 5 years ago.
Issue #39768 demo
39768-binary.jpg (143.6 KB) - added by vsamoletov 5 years ago.
Issue #39768 fixed demo
39768.diff (2.0 KB) - added by SergeyBiryukov 5 years ago.
39768.2.diff (2.6 KB) - added by SergeyBiryukov 5 years ago.
39768.3.diff (2.8 KB) - added by SergeyBiryukov 5 years ago.

Download all attachments as: .zip

Change History (23)

#1 @tristangemus
8 years ago

Welcome to core @ben.greeley

From what I can see Wordpress is handling this correctly. If a duplicate is found using the same case or not, that image is appended with 1, 2, and so on. If you uploaded an image as untitled.jpg twice, then the second time around the URL you need to pass should have the be appended with -1.

What's the use case where you're passing the original image URL even though Wordpress changes it before uploading?

#2 @ben.greeley
8 years ago

@tristangemus thanks for the prompt response. The use case is one image was named Unknown.jpg that was uploaded to a site. Later, unknown.jpg was added to the media library, specifically these two images: http://www.lamag.com/wp-content/uploads/sites/9/2017/01/Unknown.jpg and http://www.lamag.com/wp-content/uploads/sites/9/2017/01/unknown.jpg . When the full image is embedded onto a post, we have a link like <img class="aligncenter wp-image-671724 size-full" src="http://www.lamag.com/wp-content/uploads/sites/9/2017/01/unknown.jpg" />

There is some code that runs attachment_url_to_postid( 'http://www.lamag.com/wp-content/uploads/sites/9/2017/01/unknown.jpg' ), but the image that it returns is a reference to http://www.lamag.com/wp-content/uploads/sites/9/2017/01/Unknown.jpg . It appears this is happening because the table already has one value of 2017/01/Unknown.jpg, which matches its SQL query trying to find 2017/01/unknown.jpg since mySQL queries aren't case-sensitive.

I'm able to reproduce in my local environment.

#3 @tristangemus
8 years ago

Right but if you use Wordpress to insert the image, it should be referencing the image as Unknown-1.jpg. The only way I see this being an issue is if the image is manually being typed into the editor/codebase. In that case, you will get the incorrect image, but Wordpress is selecting the proper one based on intended logic.

If you are able to take a quick screen recording and upload it to Youtube, that would be helpful so we can see if this is a bug or not.

Thanks!

#4 follow-up: @archon810
5 years ago

I'm not sure how, but we're also running into a situation when two images named, say, Home.png and home.png, are uploaded days apart, and WP doesn't assign -1 to one of them.

The issue is then that attachment_url_to_postid() runs a query without checking the case, 2 rows are returned from the db, and it always picks the first one, which is wrong.

A simple solution seems to be forcing the case check in this function's query using BINARY https://stackoverflow.com/a/5629121/47680.

I'm really not sure why -1 isn't appended, and in my test just now it did, but we have several examples in the db where mixed case files exist in the same dir.

We've worked around it by duplicating and fixing attachment_url_to_postid() ourselves, but would love to see the fix in the core.

@vsamoletov
5 years ago

Issue #39768 demo

@vsamoletov
5 years ago

Issue #39768 fixed demo

#5 @SergeyBiryukov
5 years ago

  • Component changed from General to Media

#6 in reply to: ↑ 4 ; follow-up: @SergeyBiryukov
5 years ago

  • Milestone changed from Awaiting Review to 5.4

Replying to archon810:

I'm really not sure why -1 isn't appended, and in my test just now it did

Yes, it appears to be fixed with the introduction of _wp_check_existing_file_names() in 5.3.1, [46822] / #42437, so should not be an issue going forward.

I'm up for updating the query in attachment_url_to_postid() to be case-sensitive and handle already existing images, if that does not reduce performance.

#7 in reply to: ↑ 6 ; follow-up: @SergeyBiryukov
5 years ago

Replying to SergeyBiryukov:

I'm up for updating the query in attachment_url_to_postid() to be case-sensitive and handle already existing images, if that does not reduce performance.

On a site with ~32000 attachments, seeing no noticeable difference in performance:

SELECT post_id FROM wp_postmeta WHERE meta_key = '_wp_attached_file' AND meta_value = '2009/03/myimage.jpg'
Query took 0.0292 seconds.

SELECT post_id FROM wp_postmeta WHERE meta_key = '_wp_attached_file' AND BINARY meta_value = '2009/03/myimage.jpg'
Query took 0.0294 seconds

Both queries use the index:

idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra
1SIMPLEwp_postmetarefmeta_keymeta_key768const30557Using index condition; Using where

Should be good to go.

Last edited 5 years ago by SergeyBiryukov (previous) (diff)

#8 in reply to: ↑ 7 ; follow-up: @vsamoletov
5 years ago

I'm sure you saw this https://stackoverflow.com/a/56283818, but just in case.

Replying to SergeyBiryukov:

Replying to SergeyBiryukov:

I'm up for updating the query in attachment_url_to_postid() to be case-sensitive and handle already existing images, if that does not reduce performance.

On a site with ~32000 attachments, seeing no noticeable difference in performance

SELECT post_id FROM wp_postmeta WHERE meta_key = '_wp_attached_file' AND meta_value = '2009/03/myimage.jpg'
Query took 0.0292 seconds.

SELECT post_id FROM wp_postmeta WHERE meta_key = '_wp_attached_file' AND BINARY meta_value = '2009/03/myimage.jpg'
Query took 0.0294 seconds

Both queries use the index:

idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra
1SIMPLEwp_postmetarefmeta_keymeta_key768const30557Using index condition; Using where

Should be good to go.

@SergeyBiryukov
5 years ago

#9 in reply to: ↑ 8 @SergeyBiryukov
5 years ago

  • Keywords has-patch added

Replying to vsamoletov:

I'm sure you saw this https://stackoverflow.com/a/56283818, but just in case.

I missed that, thanks for bringing it to my attention.

39768.diff uses the BINARY approach.

It looks like that could introduce some unintended regressions when dealing with accented characters, or if the column charset doesn't match the session charset, e.g. latin1 vs. utf8mb4.

Using CONVERT() appears to be a better option. While utf8mb4 should be available on any modern install, that may not always be the case, so we have to check for it specifically, and fall back to the current query otherwise. This option also has a minor performance difference from the queries above in my testing:

SELECT post_id FROM wp_postmeta WHERE meta_key = '_wp_attached_file' AND meta_value = CONVERT( '2009/03/myimage.jpg' USING utf8mb4 ) COLLATE utf8mb4_bin
Query took 0.0315 seconds

39768.2.diff uses the CONVERT() approach.

Given that the main issue of wp_unique_filename() being case-sensitive is fixed as of 5.3.1, I'm now a bit less convinced attachment_url_to_postid() needs this change. That said, if 39768.2.diff looks sensible, I guess we could give it a go.

Last edited 5 years ago by SergeyBiryukov (previous) (diff)

#10 follow-up: @archon810
5 years ago

@vsamoletov has a good point regarding the warning.

But @SergeyBiryukov please consider the fact that while new cases may be resolved in 5.3.1, existing cases are still in the database and should be resolved properly.

An alternate solution, and one we went with, is to get all the rows in PHP and if >1, go through them and do a case sensitive match in PHP.

#11 in reply to: ↑ 10 @SergeyBiryukov
5 years ago

  • Keywords commit added

Replying to archon810:

An alternate solution, and one we went with, is to get all the rows in PHP and if >1, go through them and do a case sensitive match in PHP.

Thanks, that actually seems the most straightforward option here, without any side effects that I can think of.

Implemented in 39768.3.diff.

#12 @archon810
5 years ago

Looking forward to it in 5.4.

#13 @SergeyBiryukov
5 years ago

  • Resolution set to fixed
  • Status changed from new to closed

In 47010:

Media: Make sure attachment_url_to_postid() performs a case-sensitive search for the uploaded file name.

Previously, the first available match was returned, regardless of the case, which was not always the expected result.

Props archon810, ben.greeley, tristangemus, vsamoletov, SergeyBiryukov.
Fixes #39768.

#14 @SergeyBiryukov
5 years ago

  • Keywords fixed-major added
  • Milestone changed from 5.4 to 5.3.3
  • Resolution fixed deleted
  • Status changed from closed to reopened

Reopening for 5.3.3 consideration.

Last edited 5 years ago by SergeyBiryukov (previous) (diff)

#15 @SergeyBiryukov
5 years ago

  • Owner set to SergeyBiryukov
  • Status changed from reopened to accepted

#16 @audrasjb
5 years ago

  • Milestone changed from 5.3.3 to 5.4

Moving all unfixed tickets from 5.3.3 to milestone 5.4, as there is no plan for a 5.3.3 maintenance release for now.

#17 @SergeyBiryukov
5 years ago

  • Milestone changed from 5.4 to 5.3.3

This was just waiting for a backport, I think it still makes sense to include this in 5.3.3 when it's eventually released.

#18 @SergeyBiryukov
5 years ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 47132:

Media: Make sure attachment_url_to_postid() performs a case-sensitive search for the uploaded file name.

Previously, the first available match was returned, regardless of the case, which was not always the expected result.

Props archon810, ben.greeley, tristangemus, vsamoletov, SergeyBiryukov.
Merges [47010] to the 5.3 branch.
Fixes #39768.

Note: See TracTickets for help on using tickets.