Opened 2 years ago
Closed 2 years ago
#56991 closed defect (bug) (fixed)
Update to get_page_by_title in 6.1 changes WHERE clause
Reported by: | Bjorn2404 | Owned by: | peterwilsoncc |
---|---|---|---|
Milestone: | 6.1.1 | Priority: | normal |
Severity: | normal | Version: | 6.1 |
Component: | General | Keywords: | has-patch has-unit-tests commit fixed-major dev-reviewed |
Focuses: | performance | Cc: |
Description
The recent update to the get_page_by_title()
function in WordPress v6.1 appears to have changed the order in which posts are queried. Steps to replicate as follows:
- Start from WP 6.0
- Create a page called "test"
- Call get_page_by_title()
- Receive expected post ID
- Create another page called "test"
- Call get_page_by_title()
- Still receive the first one (expected)
- Delete first page
- Call get_page_by_title() again
- Receive the latest page (second) ID
- Update to WP 6.1
- Call get_page_by_title() again
- Receive first page ID in the trash
The underlying cause looks to be the addition of get_post_stati()
, which is resulting in all post statuses being included in the WHERE clause.
Bug was discovered after a production site was updated to 6.1 and the home page led to a 404.
Change History (38)
#2
@
2 years ago
- Summary changed from Update to get_page_by_title in 6.1 changes WHERE clause order to Update to get_page_by_title in 6.1 changes WHERE clause
#3
@
2 years ago
- Milestone changed from Awaiting Review to 6.1.1
Comparing the old and the new queries:
WordPress 6.0 and earlier
SELECT ID FROM wp_posts WHERE post_title = 'Page with title' AND post_type = 'page'
WordPress 6.1
SELECT wp_posts.ID FROM wp_posts WHERE 1=1 AND wp_posts.post_title = 'Page with title' AND wp_posts.post_type = 'page' AND ( ( wp_posts.post_status = 'publish' OR wp_posts.post_status = 'future' OR wp_posts.post_status = 'draft' OR wp_posts.post_status = 'pending' OR wp_posts.post_status = 'trash' OR wp_posts.post_status = 'auto-draft' OR wp_posts.post_status = 'inherit' OR wp_posts.post_status = 'request-pending' OR wp_posts.post_status = 'request-confirmed' OR wp_posts.post_status = 'request-failed' OR wp_posts.post_status = 'request-completed' OR wp_posts.post_status = 'private' ) ) ORDER BY wp_posts.post_date ASC, wp_posts.ID ASC LIMIT 0, 1
As get_page_by_title()
is now a wrapper for WP_Query
, defining all the post statuses was required to match the older query (which did not specify post_status, thus querying all of them).
As the previous query didn't specify an order, MySQL would optimize however it wanted to, which means the query is inconsistent depending on the engine, version or any number of conditions the optimizer took in to consideration. This suggest some amount of hope in the docblock the orderby clause was based on:
If more than one post uses the same title, the post with the smallest ID will be returned. Be careful: in case of more than one post having the same title, it will check the oldest publication date, not the smallest ID.
I'm able to reproduce the issue described, I'm just trying to figure out the ideal solution.
Thanks for the ticket, @Bjorn2404, and welcome to trac! I appreciate your hard work here and elsewhere trying to figure out the changes between 6.0 and 6.1.
I've moved this on to the 6.1.1 milestone for visibility.
This ticket was mentioned in PR #3562 on WordPress/wordpress-develop by @dilipbheda.
2 years ago
#4
- Keywords has-patch added
Trac ticket: https://core.trac.wordpress.org/ticket/56991
#5
@
2 years ago
I've reviewed this ticket and fixed the issue in this PR https://github.com/WordPress/wordpress-develop/pull/3562
@peterwilsoncc Can you please review the mentioned PR?
Thanks
10upsimon commented on PR #3562:
2 years ago
#6
@dilipbheda Any chance we can get the change log notes added to the function doc block as part of this fix?
@dilipbheda commented on PR #3562:
2 years ago
#7
@10upsimon added in the latest commit.
10upsimon commented on PR #3562:
2 years ago
#8
@10upsimon added in the latest commit.
Thank you @dilipbheda
@mukesh27 commented on PR #3562:
2 years ago
#9
Thanks @dilipbheda for the PR. can you please show SQL query after the changes? how can this be tested in Unit test?
@dilipbheda commented on PR #3562:
2 years ago
#10
@mukeshpanchal27 Sure!, I've exclude the trash, auto-draft
post status from the query.
SELECT wp_posts.ID FROM wp_posts WHERE 1 = 1 AND wp_posts.post_title = 'Test' AND wp_posts.post_type = 'page' AND ( ( wp_posts.post_status = 'publish' OR wp_posts.post_status = 'future' OR wp_posts.post_status = 'draft' OR wp_posts.post_status = 'pending' OR wp_posts.post_status = 'inherit' OR wp_posts.post_status = 'request-pending' OR wp_posts.post_status = 'request-confirmed' OR wp_posts.post_status = 'request-failed' OR wp_posts.post_status = 'request-completed' OR wp_posts.post_status = 'private' ) ) ORDER BY wp_posts.post_date ASC, wp_posts.ID ASC LIMIT 0, 1
This ticket was mentioned in Slack in #core by jeffpaul. View the logs.
2 years ago
#12
@
2 years ago
Am I missing something here? The old querying in 6.0 did not limit the querying by if the post status was public. Instead of not limit by post status, we are querying by all post statuses. Which is functionally the same. If we change this to only get no trashed posts, that is a breaking change. What am I missing here?
#13
@
2 years ago
When I changed the post status to "any," the mentioned problem seemed to be resolved.
'post_status' => 'any',
SQL Query:
SELECT wp_posts.ID FROM wp_posts WHERE 1 = 1 AND wp_posts.post_title = 'Test' AND wp_posts.post_type = 'page' AND ( ( wp_posts.post_status <> 'trash' AND wp_posts.post_status <> 'auto-draft' ) ) ORDER BY wp_posts.post_date ASC, wp_posts.ID ASC LIMIT 0, 1
#14
follow-up:
↓ 23
@
2 years ago
Having considered this for the last few days, I remain unsure of what is the best course of action here. There are no good choices.
The problem:
Without an ordering clause in 6.0 and earlier versions, it's basically been chance as to which post object is returned. The docblock describing the order has historically been inaccurate and different database engines or versions will optimize and order the query differently.
Whether the function returns a public post has historically been left to chance.
Even thought the new query specifies all post statuses (matching the previous behavior), some -- or possibly all -- database engines are optimizing the query differently and now returning a different post in some circumstances.
Possible solutions:
- close as
wontfix
and recommend developers useWP_Query
when wishing to specify particular post statuses - break back-compat by querying only public post statuses (per pull request)
- add a
post_status
parameter, either retain or change the default statuses searched - revert to previous wpdb query but add a cache to retain the performance gain by switching to
WP_Query
. Come back to this in 6.2 so a dev-note can be published with any changes in behaviour.
With 6.1.1 scheduled for next week, at the time of writing, my inclination is to revert with a caching wrapper so WordPress Core developers can consider the best course for 6.2
There are no optimal solutions so I think targeting the least-worst solution is the best that can be done on this ticket. In my view, maintaining backward compatibility is the least worst choice.
I'll set up a PR.
This ticket was mentioned in PR #3573 on WordPress/wordpress-develop by @peterwilsoncc.
2 years ago
#15
#17
in reply to:
↑ 16
;
follow-up:
↓ 18
@
2 years ago
Replying to spacedmonkey:
I would recommend adding a new param to the function signature, that allows to pass post statuses. This means no bc break and developers have to opt in.
The problem is that the back-compat break is retained due to the WHERE clause changing how the mysql optimizer processes the query.
I tried setting orderby => none, posts_per_page => -1
locally by the WHERE clause was still changing the results between 6.0 and 6.1
In the linked pull request:
#18
in reply to:
↑ 17
@
2 years ago
Replying to peterwilsoncc:
Replying to spacedmonkey:
I would recommend adding a new param to the function signature, that allows to pass post statuses. This means no bc break and developers have to opt in.
The problem is that the back-compat break is retained due to the WHERE clause changing how the mysql optimizer processes the query.
I tried setting
orderby => none, posts_per_page => -1
locally by the WHERE clause was still changing the results between 6.0 and 6.1
In the linked pull request:
How about adding an option to WP_Query to pass any or all to post status? If all, then do not add post status to the query.
#19
@
2 years ago
post_status => 'any'
already exists but limits the query to those readable by the current user (it uses exclude_from_search
for logged out users).
I think adding a second option, all
, would create confusion and developers could accidentally expose data they don't intend to.
Something like this may be a useful solution in the fullness of time but I am reluctant to add it to a point release as it would require substantial modifications to WP_Query::parse_query()
. The post type and status clauses are intertwined and would require a tonne of tests before changes could be considered.
This ticket was mentioned in PR #3575 on WordPress/wordpress-develop by @spacedmonkey.
2 years ago
#20
Trac ticket: https://core.trac.wordpress.org/ticket/56991
#21
follow-up:
↓ 22
@
2 years ago
After some more testing, I discovered the issue. I have a PR here.
The issue is the order by. Instead of defining order date / id, simple pass 'none' as orderby field. Resulting query.
SELECT wp_posts.ID FROM wp_posts WHERE 1=1 AND wp_posts.post_title = 'test' AND wp_posts.post_type = 'page' AND ((wp_posts.post_status = 'publish' OR wp_posts.post_status = 'future' OR wp_posts.post_status = 'draft' OR wp_posts.post_status = 'pending' OR wp_posts.post_status = 'trash' OR wp_posts.post_status = 'auto-draft' OR wp_posts.post_status = 'inherit' OR wp_posts.post_status = 'request-pending' OR wp_posts.post_status = 'request-confirmed' OR wp_posts.post_status = 'request-failed' OR wp_posts.post_status = 'request-completed' OR wp_posts.post_status = 'private')) LIMIT 0, 1
Compared to 6.0
SELECT ID FROM wp_posts WHERE post_title = 'test' AND post_type = 'page'
In my manual testing, I get the same result now for both queries.
#22
in reply to:
↑ 21
@
2 years ago
Replying to spacedmonkey:
In my manual testing, I get the same result now for both queries.
I'd tried this last week but was seeing different results.
I've pushed a pull request with your changes and some tests that pass on the GitHub workflow but do not pass on my local. Without your changes the tests fail both on GitHub and locally but the number of failures differ (one locally, two on GitHub).
This highlights the inherent problem, without an orderby the order in which results are returned is undefined.
I reached out to a mate who has contributed to MySQL in the past and without an order specified, MySQL uses the optimal index based on the WHERE clause as judged by the optimizer of that particular version and that particular engine.
The problem is, as demonstrated by different results on my local, that different version/engine combinations use different algorithms for determining the optimal index to use.
#23
in reply to:
↑ 14
@
2 years ago
Replying to peterwilsoncc:
close as wontfix and recommend developers use
WP_Query
when wishing to specify particular post statuses
This makes the most sense to me in my opinion.
break back-compat by querying only public post statuses (per pull request)
I agree this would be a BC break.
add a post_status parameter, either retain or change the default statuses searched
I'm generally not a fan of these one of functions to begin with. The DUX of WP_Query
is sufficient to point users to it if they want to do a more customized query instead of adding multiple function parameters to customize behavior.
revert to previous wpdb query but add a cache to retain the performance gain by switching to WP_Query. Come back to this in 6.2 so a dev-note can be published with any changes in behaviour.
If we revert, I think we should just do a clean revert. Not try and add in separate caching behavior in a minor release.
We could also potentially not set a limit
of 1
result, and instead try to return the first post with a publicly queryable post status. If there are none, then we'd just return the first result. This should keep the desired behavior for the reporter, but also allow private posts to be returned if that is the only match.
#24
@
2 years ago
- Keywords has-unit-tests added
- Owner set to spacedmonkey
- Status changed from new to assigned
I have updated my PR here. I added unit tests that prove this my fix works.
#26
@
2 years ago
Re long list of ORs. Consider using this syntax:
wp_posts.post_status IN ( 'publish', 'future', ... )
The performance will be the same since the optimizer turns the ORs into IN, but it is more concise.
As you say, this may be cleaner:
wp_posts.post_status <> 'trash' AND wp_posts.post_status <> 'auto-draft'
Or, similarly
AND wp_posts.post_status NOT IN ('trash', 'auto-draft')
None of the variants should have any speed advantage.
Please provide EXPLAIN SELECT ...
for each variant that you try. While I don't expect any performance difference, I would like to see the EXPLAIN in hopes that it says what is going on differently.
There may be better indexes.
Elsewhere I saw a similar query that AND'd to taxomony; this may need a slightly different treatment.
#27
@
2 years ago
The database queries here are generated by WP_Query. If there is improves to be made to those queries then it should be done in another ticket.
#28
follow-up:
↓ 30
@
2 years ago
Between this and #57039, let's revert the changes to get_page_by_title()
and either redo or deprecate the function in 6.2.
#30
in reply to:
↑ 28
@
2 years ago
Replying to peterwilsoncc:
Between this and #57039, let's revert the changes to
get_page_by_title()
and either redo or deprecate the function in 6.2.
I don’t think it is a good idea to revert this. It goes from be cached to not being cached. Which could result in performance problems that have relied on this function.
#31
@
2 years ago
As noted via Slack, while the tests in this pull request are passing on GitHub and some locals, they don't pass in all environments with different MySQL versions/engines.
These are the results in my local, I presume we'd see further inconsistent results on the various hosting providers running tests.
vagrant@wp-dev:/vagrant/wordpress-develop$ git rev-parse --abbrev-ref HEAD fix/order-by-none vagrant@wp-dev:/vagrant/wordpress-develop$ phpunit --group 56991 Installing... Running as single site... To run multisite, use -c tests/phpunit/multisite.xml Not running ajax tests. To execute these, use --group ajax. Not running ms-files tests. To execute these, use --group ms-files. Not running external-http tests. To execute these, use --group external-http. PHPUnit 7.5.20 by Sebastian Bergmann and contributors. .F 2 / 2 (100%) Time: 14.9 seconds, Memory: 242.05 MB There was 1 failure: 1) Tests_Post_GetPageByTitle::test_return_same_as_old_query_2 Failed asserting that 15 is identical to 16. /vagrant/wordpress-develop/tests/phpunit/tests/post/getPageByTitle.php:365
I've opened #57041 to deprecate the function in 6.2
While the previous version of the function was flawed, I don't think differently flawed is an improvement.
#32
@
2 years ago
I've updated my earlier pull request to:
- revert changes to
get_page_by_title()
- remove tests added in the 6.0 cycle: as the DB query is unpredictable due to the lack of orderby, any tests relying on consistent results across different database versions and engines are flakey.
#35
@
2 years ago
- Keywords fixed-major added
- Resolution fixed deleted
- Status changed from closed to reopened
Reopening for merging in to the 6.1 branch.
@peterwilsoncc commented on PR #3573:
2 years ago
#36
Merged https://core.trac.wordpress.org/changeset/54782 / 19d5bd0787c9daf1d8a67bf5c0a3e997abf1f89a
Also adding that the changelog was not updated as part of the last release, this can be validated by visiting https://developer.wordpress.org/reference/functions/get_page_by_title/#changelog as well as via the docblock of the existing function in core.
Consideration should be made to add the relevant item to the function docblock, i.e: