Opened 6 years ago
Last modified 3 years ago
#44349 new defect (bug)
Posts show up multiple times in backend when they have the exact same date
Reported by: | wzshop | Owned by: | |
---|---|---|---|
Milestone: | Awaiting Review | Priority: | normal |
Severity: | normal | Version: | 4.9.6 |
Component: | Query | Keywords: | reporter-feedback 2nd-opinion |
Focuses: | Cc: |
Description
When I import for example 200 posts with the Import WordPress tool and then go to wp-admin>posts, the posts show up in the query multiple times (for instance on page 1 and page 2). Seems like others are left out.
Attachments (1)
Change History (32)
#1
follow-up:
↓ 2
@
6 years ago
- Component changed from General to Import
- Keywords reporter-feedback added
#2
in reply to:
↑ 1
@
6 years ago
Replying to pbiron:
Welcome to trac!
Can you please let us know which version of the
WordPress Importer
plugin you are using?
Version 0.6.4
#3
follow-up:
↓ 4
@
6 years ago
Thank you for the version # and for attaching a sample export to test with.
I'm not seeing the behavior you report when I import post export demo.xml
into a fresh install (no plugins other than WordPress Importer v 0.6.4
and twentyseventeen
theme).
Also, post export demo.xml
only has 149 posts (confirmed by evaluating the xpath expression //input
on it, as well as the number of posts that get imported in my test site) and not the 200 you mention in this ticket's description. Could that be why "Seems like others are left out"?
#4
in reply to:
↑ 3
@
6 years ago
Sorry, meant 149 posts.
See the vid here: https://youtu.be/IK84U3ERCi4
#5
@
6 years ago
- Component changed from Import to Query
- Keywords 2nd-opinion added
Thanx for the video! Now that I understand what you are reporting, I can confirm that I'm seeing what you are seeing. At first sight this appears to be a bug in the importer, but it is not. The explanation is a little complicated, but I'll do my best.
First, you can confirm that "Post 129" was only imported once by searching for "Post 129" and seeing that it only appears once in the search results.
The reason it appears on page 1 & page 2 is the result of 2 factors:
- more than one post in the WXR file being imported have the same value in the
<wp:post_date>
element, thus more than one of the imported posts have the same value in thepost_date
field. - the pagination on
/wp-admin/edit.php
is accomplished viaLIMIT
andORDER BY
clauses in the SQL request. MySQL performs certain optimizations on queries that include aLIMIT
and those optimizations are what is producing the seemingly buggy behavior you're seeing.
As explained in MySQL's LIMIT Query Optimization,
If you combine LIMIT row_count with ORDER BY, MySQL stops sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result. If ordering is done by using an index, this is very fast. If a filesort must be done, all rows that match the query without the LIMIT clause are selected, and most or all of them are sorted, before the first row_count are found. After the initial rows have been found, MySQL does not sort any remainder of the result set.
One manifestation of this behavior is that an ORDER BY query with and without LIMIT may return rows in different order, as described later in this section.
and more importantly
If multiple rows have identical values in the ORDER BY columns, the server is free to return those rows in any order, and may do so differently depending on the overall execution plan. In other words, the sort order of those rows is nondeterministic with respect to the nonordered columns.
So, with the posts imported from post export demo.xml
, the 1st page of posts when Number of items per page = 20
(in Screen Options
) contains some posts with post_date = 2018-06-11 15:51:54
and some with post_date = 2018-06-11 15:51:53
(the later including "Post 129") with the order of posts within each "group" being "random". The 2nd page contains some posts with post_date = 2018-06-11 15:51:53
(including "Post 129") and some with post_date = 2018-06-11 15:51:52
.
If you change Number of items per page = 30
all of the posts with post_date = 2018-06-11 15:51:53
will appear on the 1st page (again in a "random" order) and thus "Post 129" will only appear on that 1st page.
Alternatively, if the posts in the site you exported (to produce post export demo.xml
) all had unique post_date
's, then when you import them into another site you will not see this behavior.
Yes, this is behavior is confusing, but it is a result of MySQL's "Limit Query Optimization" and is not a bug in the Importer nor in WP_Query
.
Thus, I believe this ticket should be closed as "invalid", but I will leave it to one of the SQL gurus on the team to confirm my analysis and close it accordingly.
This ticket was mentioned in Slack in #core by pbiron. View the logs.
6 years ago
#7
@
6 years ago
Thank you very much for the broad explanation.
Is there any way to 'fix' this seemingly buggy behavior or a workaround, since it also has its effect on the frontend, when browsing the posts per category for instance. This is not something I would like to present to my visitors.
I can however comfirm that this problem won't occur when setting different dates.
Thanks again.
#8
@
6 years ago
Isn't non-determinisum fun :-)
Is there any way to 'fix' this seemingly buggy behavior
As trac is not the place for support questions, I'd suggest you ask that that question on https://wordpress.stackexchange.com/ (feel free to reference this ticket when you do).
#9
@
6 years ago
Ok, 1 more question to understand the MySQL's LIMIT Query Optimization;
It seems like posts are showing up multiple times, because of the posts are randomly ordered when browsing the pagination, but they don't actually show up multiple times?
Thanks
#10
@
6 years ago
In pure-SQL terms, any query of the form:
SELECT column FROM table ORDER BY order_by_column LIMIT x,y
where
order_by_column
is indexed, and- there are more than
y
rows with the same value inorder_by_column
that span the "pages"x-1
andx
could result in some rows from "page" x-1
also appearing on "page" x
.
In WP_Query
terms, the above translates into any query of the form:
<?php $args = array( // the following are the indexed columns in the wp_posts table 'orderby' => any of 'type', 'date', 'id', 'parent', 'author' or 'name' // each of the below "pagination" args results in a LIMIT clause 'posts_per_page|showposts|posts_per_archive_page' => any value other than -1 ); $query = new WP_Query( $args );// including get_posts( $args )
There are other WP_Query
corner cases that will exhibit the non-deterministic behavior as well, including but probably not limited to, hooking into any of posts_orderby
, posts_clauses
, posts_orderby_request
, posts_clauses_request
, posts_request
to add any of the above (plus wp_posts.post_status
, which is not directly legal in the args to new WP_Query()
) to the "ORDER BY" clause of the SQL query that WP_Query
ultimately executes.
#13
@
5 years ago
Another vote for adding a second default "order by" for example the post ID.
My problem: we migrated data from an old osCommerce store over into WP. During the migration the timestamp for all media files defaulted to "1970-01-01 00:00:00". This meant we had over a thousand attachments with the same timestamp.
Result: In the media library, it was impossible to scroll through the images as the pages kept showing the same content.
Our solution: we looped through the posts & added 1 second to the post_date and post_date_gmt to make it unique. The media library can now be navigated with prev / next without any issues.
Preferred solution: The assumption that post_date is unique is flawed as data may have been imported externally. To avoid issues with paginated scrolling the default "order by" should be supplemented by "id" which is always unique.
#14
@
4 years ago
If it helps, here's a little script I wrote to increment all of the post times by a second to "fix" the issue:
add_action( 'init', function() { if ( isset( $_GET['dbx_increment_dates'] ) ) { $query = new WP_Query( [ 'post_type' => $_GET['dbx_increment_dates'], 'posts_per_page' => -1, 'post_status' => 'publish', ] ); $seconds = 1; foreach( $query->posts as $post ) { // Time is already non-standard, so we can assume it's fine. if ( false === stripos( $post->post_date, ' 00:00:00' ) ) { continue; } $new_date = gmdate( 'Y-m-d H:i:s', strtotime( $post->post_date ) + $seconds ); wp_update_post( [ 'ID' => $post->ID, 'post_date' => $new_date, 'post_date_gmt' => $new_date, ] ); $seconds++; } echo 'process complete'; exit; } } );
#15
@
4 years ago
- Summary changed from Posts show up multiple times in backend, when imported with Import WordPress tool to Posts show up multiple times in backend when they have the exact same date
#16
@
4 years ago
I've seen this indeterminate sort order bug appear in several places in core when working on applications that ingest or generate a large amount data in bulk, therefore resulting in multiple posts with the exact same date.
The correct solution is to add secondary field to the orderby clause, as suggested above, eg post_date, ID
, but this would need careful consideration.
#20
follow-up:
↓ 21
@
4 years ago
I saw this also in the Theme Unit Test data, when I had duplicated some posts, they ended up with the exact same time (which could happen on a multi-author blog). But the output was always different between the list of latest posts and the links to Next/Prev once you were on a single post. So make sure the fix is not just for pagination, but also that the navigation links on single posts get the same order that the archive page has.
Would there be a case when the orderby should not have ID
?
#21
in reply to:
↑ 20
@
4 years ago
Replying to joyously:
Would there be a case when the orderby should not have
ID
?
I don't believe so, no. This change does have potential to break custom filters that are doing string replacements on the resulting SQL though.
This ticket was mentioned in Slack in #core by pbiron. View the logs.
4 years ago
#23
@
4 years ago
also see https://core.trac.wordpress.org/ticket/47642#comment:16
thanks @helen
This ticket was mentioned in Slack in #core by helen. View the logs.
4 years ago
#26
@
4 years ago
Patch added in https://core.trac.wordpress.org/ticket/47642#comment:20
Welcome to trac!
Can you please let us know which version of the
WordPress Importer
plugin you are using?