Make WordPress Core

Opened 2 years ago

Closed 2 years ago

#56991 closed defect (bug) (fixed)

Update to get_page_by_title in 6.1 changes WHERE clause

Reported by: bjorn2404's profile Bjorn2404 Owned by: peterwilsoncc's profile peterwilsoncc
Milestone: 6.1.1 Priority: normal
Severity: normal Version: 6.1
Component: General Keywords: has-patch has-unit-tests commit fixed-major dev-reviewed
Focuses: performance Cc:

Description

The recent update to the get_page_by_title() function in WordPress v6.1 appears to have changed the order in which posts are queried. Steps to replicate as follows:

  • Start from WP 6.0
  • Create a page called "test"
  • Call get_page_by_title()
  • Receive expected post ID
  • Create another page called "test"
  • Call get_page_by_title()
  • Still receive the first one (expected)
  • Delete first page
  • Call get_page_by_title() again
  • Receive the latest page (second) ID
  • Update to WP 6.1
  • Call get_page_by_title() again
  • Receive first page ID in the trash

The underlying cause looks to be the addition of get_post_stati(), which is resulting in all post statuses being included in the WHERE clause.

Bug was discovered after a production site was updated to 6.1 and the home page led to a 404.

Change History (38)

#1 @10upsimon
2 years ago

Also adding that the changelog was not updated as part of the last release, this can be validated by visiting https://developer.wordpress.org/reference/functions/get_page_by_title/#changelog as well as via the docblock of the existing function in core.

Consideration should be made to add the relevant item to the function docblock, i.e:

@since 6.1.0 Using WP_Query to employ the use of caching.

#2 @Bjorn2404
2 years ago

  • Summary changed from Update to get_page_by_title in 6.1 changes WHERE clause order to Update to get_page_by_title in 6.1 changes WHERE clause

#3 @peterwilsoncc
2 years ago

  • Milestone changed from Awaiting Review to 6.1.1

Comparing the old and the new queries:

WordPress 6.0 and earlier

SELECT ID
FROM wp_posts
WHERE post_title = 'Page with title'
AND post_type = 'page'

WordPress 6.1

SELECT wp_posts.ID
FROM wp_posts 
WHERE 1=1
        AND wp_posts.post_title = 'Page with title' 
        AND wp_posts.post_type = 'page' 
        AND (
                (
                        wp_posts.post_status = 'publish' OR 
                        wp_posts.post_status = 'future' OR 
                        wp_posts.post_status = 'draft' OR 
                        wp_posts.post_status = 'pending' OR 
                        wp_posts.post_status = 'trash' OR 
                        wp_posts.post_status = 'auto-draft' OR 
                        wp_posts.post_status = 'inherit' OR 
                        wp_posts.post_status = 'request-pending' OR 
                        wp_posts.post_status = 'request-confirmed' OR 
                        wp_posts.post_status = 'request-failed' OR 
                        wp_posts.post_status = 'request-completed' OR 
                        wp_posts.post_status = 'private'
                )
        )                               
ORDER BY wp_posts.post_date ASC, wp_posts.ID ASC
LIMIT 0, 1

As get_page_by_title() is now a wrapper for WP_Query, defining all the post statuses was required to match the older query (which did not specify post_status, thus querying all of them).

As the previous query didn't specify an order, MySQL would optimize however it wanted to, which means the query is inconsistent depending on the engine, version or any number of conditions the optimizer took in to consideration. This suggest some amount of hope in the docblock the orderby clause was based on:

If more than one post uses the same title, the post with the smallest ID will be returned. Be careful: in case of more than one post having the same title, it will check the oldest publication date, not the smallest ID.

I'm able to reproduce the issue described, I'm just trying to figure out the ideal solution.

Thanks for the ticket, @Bjorn2404, and welcome to trac! I appreciate your hard work here and elsewhere trying to figure out the changes between 6.0 and 6.1.

I've moved this on to the 6.1.1 milestone for visibility.

This ticket was mentioned in PR #3562 on WordPress/wordpress-develop by @dilipbheda.


2 years ago
#4

  • Keywords has-patch added

#5 @dilipbheda
2 years ago

I've reviewed this ticket and fixed the issue in this PR https://github.com/WordPress/wordpress-develop/pull/3562

@peterwilsoncc Can you please review the mentioned PR?

Thanks

10upsimon commented on PR #3562:


2 years ago
#6

@dilipbheda Any chance we can get the change log notes added to the function doc block as part of this fix?

@dilipbheda commented on PR #3562:


2 years ago
#7

@10upsimon added in the latest commit.

10upsimon commented on PR #3562:


2 years ago
#8

@10upsimon added in the latest commit.

Thank you @dilipbheda

@mukesh27 commented on PR #3562:


2 years ago
#9

Thanks @dilipbheda for the PR. can you please show SQL query after the changes? how can this be tested in Unit test?

@dilipbheda commented on PR #3562:


2 years ago
#10

@mukeshpanchal27 Sure!, I've exclude the trash, auto-draft post status from the query.

SELECT 
  wp_posts.ID 
FROM 
  wp_posts 
WHERE 
  1 = 1 
  AND wp_posts.post_title = 'Test' 
  AND wp_posts.post_type = 'page' 
  AND (
    (
      wp_posts.post_status = 'publish' 
      OR wp_posts.post_status = 'future' 
      OR wp_posts.post_status = 'draft' 
      OR wp_posts.post_status = 'pending' 
      OR wp_posts.post_status = 'inherit' 
      OR wp_posts.post_status = 'request-pending' 
      OR wp_posts.post_status = 'request-confirmed' 
      OR wp_posts.post_status = 'request-failed' 
      OR wp_posts.post_status = 'request-completed' 
      OR wp_posts.post_status = 'private'
    )
  ) 
ORDER BY 
  wp_posts.post_date ASC, 
  wp_posts.ID ASC 
LIMIT 
  0, 1

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


2 years ago

#12 @spacedmonkey
2 years ago

Am I missing something here? The old querying in 6.0 did not limit the querying by if the post status was public. Instead of not limit by post status, we are querying by all post statuses. Which is functionally the same. If we change this to only get no trashed posts, that is a breaking change. What am I missing here?

#13 @dilipbheda
2 years ago

When I changed the post status to "any," the mentioned problem seemed to be resolved.

'post_status' => 'any',

SQL Query:

SELECT 
  wp_posts.ID 
FROM 
  wp_posts 
WHERE 
  1 = 1 
  AND wp_posts.post_title = 'Test' 
  AND wp_posts.post_type = 'page' 
  AND (
    (
      wp_posts.post_status <> 'trash' 
      AND wp_posts.post_status <> 'auto-draft'
    )
  ) 
ORDER BY 
  wp_posts.post_date ASC, 
  wp_posts.ID ASC 
LIMIT 
  0, 1

#14 follow-up: @peterwilsoncc
2 years ago

Having considered this for the last few days, I remain unsure of what is the best course of action here. There are no good choices.

The problem:

Without an ordering clause in 6.0 and earlier versions, it's basically been chance as to which post object is returned. The docblock describing the order has historically been inaccurate and different database engines or versions will optimize and order the query differently.

Whether the function returns a public post has historically been left to chance.

Even thought the new query specifies all post statuses (matching the previous behavior), some -- or possibly all -- database engines are optimizing the query differently and now returning a different post in some circumstances.

Possible solutions:

  • close as wontfix and recommend developers use WP_Query when wishing to specify particular post statuses
  • break back-compat by querying only public post statuses (per pull request)
  • add a post_status parameter, either retain or change the default statuses searched
  • revert to previous wpdb query but add a cache to retain the performance gain by switching to WP_Query. Come back to this in 6.2 so a dev-note can be published with any changes in behaviour.

With 6.1.1 scheduled for next week, at the time of writing, my inclination is to revert with a caching wrapper so WordPress Core developers can consider the best course for 6.2

There are no optimal solutions so I think targeting the least-worst solution is the best that can be done on this ticket. In my view, maintaining backward compatibility is the least worst choice.

I'll set up a PR.

#16 follow-up: @spacedmonkey
2 years ago

Related #56399 #20350

Seems to be a known issue before.

I would recommend adding a new param to the function signature, that allows to pass post statuses. This means no bc break and developers have to opt in.

#17 in reply to: ↑ 16 ; follow-up: @peterwilsoncc
2 years ago

Replying to spacedmonkey:

I would recommend adding a new param to the function signature, that allows to pass post statuses. This means no bc break and developers have to opt in.

The problem is that the back-compat break is retained due to the WHERE clause changing how the mysql optimizer processes the query.

I tried setting orderby => none, posts_per_page => -1 locally by the WHERE clause was still changing the results between 6.0 and 6.1

In the linked pull request:

#18 in reply to: ↑ 17 @spacedmonkey
2 years ago

Replying to peterwilsoncc:

Replying to spacedmonkey:

I would recommend adding a new param to the function signature, that allows to pass post statuses. This means no bc break and developers have to opt in.

The problem is that the back-compat break is retained due to the WHERE clause changing how the mysql optimizer processes the query.

I tried setting orderby => none, posts_per_page => -1 locally by the WHERE clause was still changing the results between 6.0 and 6.1

In the linked pull request:

How about adding an option to WP_Query to pass any or all to post status? If all, then do not add post status to the query.

#19 @peterwilsoncc
2 years ago

post_status => 'any' already exists but limits the query to those readable by the current user (it uses exclude_from_search for logged out users).

I think adding a second option, all, would create confusion and developers could accidentally expose data they don't intend to.

Something like this may be a useful solution in the fullness of time but I am reluctant to add it to a point release as it would require substantial modifications to WP_Query::parse_query(). The post type and status clauses are intertwined and would require a tonne of tests before changes could be considered.

#21 follow-up: @spacedmonkey
2 years ago

After some more testing, I discovered the issue. I have a PR here.

The issue is the order by. Instead of defining order date / id, simple pass 'none' as orderby field. Resulting query.

SELECT wp_posts.ID
FROM wp_posts
WHERE 1=1
AND wp_posts.post_title = 'test'
AND wp_posts.post_type = 'page'
AND ((wp_posts.post_status = 'publish'
OR wp_posts.post_status = 'future'
OR wp_posts.post_status = 'draft'
OR wp_posts.post_status = 'pending'
OR wp_posts.post_status = 'trash'
OR wp_posts.post_status = 'auto-draft'
OR wp_posts.post_status = 'inherit'
OR wp_posts.post_status = 'request-pending'
OR wp_posts.post_status = 'request-confirmed'
OR wp_posts.post_status = 'request-failed'
OR wp_posts.post_status = 'request-completed'
OR wp_posts.post_status = 'private'))
LIMIT 0, 1

Compared to 6.0

SELECT ID
FROM wp_posts
WHERE post_title = 'test'
AND post_type = 'page'

In my manual testing, I get the same result now for both queries.

#22 in reply to: ↑ 21 @peterwilsoncc
2 years ago

Replying to spacedmonkey:

In my manual testing, I get the same result now for both queries.

I'd tried this last week but was seeing different results.

I've pushed a pull request with your changes and some tests that pass on the GitHub workflow but do not pass on my local. Without your changes the tests fail both on GitHub and locally but the number of failures differ (one locally, two on GitHub).

This highlights the inherent problem, without an orderby the order in which results are returned is undefined.

I reached out to a mate who has contributed to MySQL in the past and without an order specified, MySQL uses the optimal index based on the WHERE clause as judged by the optimizer of that particular version and that particular engine.

The problem is, as demonstrated by different results on my local, that different version/engine combinations use different algorithms for determining the optimal index to use.

#23 in reply to: ↑ 14 @TimothyBlynJacobs
2 years ago

Replying to peterwilsoncc:

close as wontfix and recommend developers use WP_Query when wishing to specify particular post statuses

This makes the most sense to me in my opinion.

break back-compat by querying only public post statuses (per pull request)

I agree this would be a BC break.

add a post_status parameter, either retain or change the default statuses searched

I'm generally not a fan of these one of functions to begin with. The DUX of WP_Query is sufficient to point users to it if they want to do a more customized query instead of adding multiple function parameters to customize behavior.

revert to previous wpdb query but add a cache to retain the performance gain by switching to WP_Query. Come back to this in 6.2 so a dev-note can be published with any changes in behaviour.

If we revert, I think we should just do a clean revert. Not try and add in separate caching behavior in a minor release.


We could also potentially not set a limit of 1 result, and instead try to return the first post with a publicly queryable post status. If there are none, then we'd just return the first result. This should keep the desired behavior for the reporter, but also allow private posts to be returned if that is the only match.

#24 @spacedmonkey
2 years ago

  • Keywords has-unit-tests added
  • Owner set to spacedmonkey
  • Status changed from new to assigned

I have updated my PR here. I added unit tests that prove this my fix works.

#25 @spacedmonkey
2 years ago

  • Focuses performance added

#26 @rjasdfiii
2 years ago

Re long list of ORs. Consider using this syntax:

wp_posts.post_status IN ( 'publish', 'future', ... )

The performance will be the same since the optimizer turns the ORs into IN, but it is more concise.
As you say, this may be cleaner:

          wp_posts.post_status <> 'trash' 
      AND wp_posts.post_status <> 'auto-draft'

Or, similarly

      AND wp_posts.post_status NOT IN ('trash', 'auto-draft')

None of the variants should have any speed advantage.

Please provide EXPLAIN SELECT ... for each variant that you try. While I don't expect any performance difference, I would like to see the EXPLAIN in hopes that it says what is going on differently.

There may be better indexes.

Elsewhere I saw a similar query that AND'd to taxomony; this may need a slightly different treatment.

#27 @spacedmonkey
2 years ago

The database queries here are generated by WP_Query. If there is improves to be made to those queries then it should be done in another ticket.

#28 follow-up: @peterwilsoncc
2 years ago

Between this and #57039, let's revert the changes to get_page_by_title() and either redo or deprecate the function in 6.2.

#29 @peterwilsoncc
2 years ago

  • Owner changed from spacedmonkey to peterwilsoncc

#30 in reply to: ↑ 28 @spacedmonkey
2 years ago

Replying to peterwilsoncc:

Between this and #57039, let's revert the changes to get_page_by_title() and either redo or deprecate the function in 6.2.

I don’t think it is a good idea to revert this. It goes from be cached to not being cached. Which could result in performance problems that have relied on this function.

#31 @peterwilsoncc
2 years ago

As noted via Slack, while the tests in this pull request are passing on GitHub and some locals, they don't pass in all environments with different MySQL versions/engines.

These are the results in my local, I presume we'd see further inconsistent results on the various hosting providers running tests.

vagrant@wp-dev:/vagrant/wordpress-develop$ git rev-parse --abbrev-ref HEAD
fix/order-by-none
vagrant@wp-dev:/vagrant/wordpress-develop$ phpunit --group 56991
Installing...
Running as single site... To run multisite, use -c tests/phpunit/multisite.xml
Not running ajax tests. To execute these, use --group ajax.
Not running ms-files tests. To execute these, use --group ms-files.
Not running external-http tests. To execute these, use --group external-http.
PHPUnit 7.5.20 by Sebastian Bergmann and contributors.

.F                                                                  2 / 2 (100%)

Time: 14.9 seconds, Memory: 242.05 MB

There was 1 failure:

1) Tests_Post_GetPageByTitle::test_return_same_as_old_query_2
Failed asserting that 15 is identical to 16.

/vagrant/wordpress-develop/tests/phpunit/tests/post/getPageByTitle.php:365

I've opened #57041 to deprecate the function in 6.2

While the previous version of the function was flawed, I don't think differently flawed is an improvement.

#32 @peterwilsoncc
2 years ago

I've updated my earlier pull request to:

  • revert changes to get_page_by_title()
  • remove tests added in the 6.0 cycle: as the DB query is unpredictable due to the lack of orderby, any tests relying on consistent results across different database versions and engines are flakey.

#33 @TimothyBlynJacobs
2 years ago

  • Keywords commit added

PR looks good to me.

#34 @peterwilsoncc
2 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 54782:

Posts, Post Types: Revert get_page_by_title()'s use of WP_Query.

Revert to legacy database query in get_pages_by_title(). Due to the lack of orderby clause in the previous database query, it is not possible to gain consistent results by converting the function to a WP_Query wrapper.

Reverts [54271, 54242, 54234].

Props Bjorn2404, 10upsimon, dilipbheda, mukesh27, spacedmonkey, TimothyBlynJacobs, rjasdfiii, stentibbing, pbiron, pento.
Fixes #57039, #56991.
See #57041.

#35 @peterwilsoncc
2 years ago

  • Keywords fixed-major added
  • Resolution fixed deleted
  • Status changed from closed to reopened

Reopening for merging in to the 6.1 branch.

@peterwilsoncc commented on PR #3573:


2 years ago
#36

Merged https://core.trac.wordpress.org/changeset/54782 / 19d5bd0787c9daf1d8a67bf5c0a3e997abf1f89a

#37 @TimothyBlynJacobs
2 years ago

  • Keywords dev-reviewed added

Looks good to backport.

#38 @peterwilsoncc
2 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 54783:

Posts, Post Types: Revert get_page_by_title()'s use of WP_Query.

Revert to legacy database query in get_pages_by_title(). Due to the lack of orderby clause in the previous database query, it is not possible to gain consistent results by converting the function to a WP_Query wrapper.

Reverts [54271, 54242, 54234].

Props Bjorn2404, 10upsimon, dilipbheda, mukesh27, spacedmonkey, TimothyBlynJacobs, rjasdfiii, stentibbing, pbiron, pento.
Merges [54782] to the 6.1 branch.
Fixes #57039, #56991.
See #57041.

Note: See TracTickets for help on using tickets.