Opened 7 years ago
Last modified 4 weeks ago
#44349 new defect (bug)
Posts show up multiple times in backend when they have the exact same date
| Reported by: |
|
Owned by: | |
|---|---|---|---|
| Milestone: | Awaiting Review | Priority: | normal |
| Severity: | normal | Version: | 4.9.6 |
| Component: | Query | Keywords: | reporter-feedback 2nd-opinion |
| Focuses: | Cc: |
Description
When I import for example 200 posts with the Import WordPress tool and then go to wp-admin>posts, the posts show up in the query multiple times (for instance on page 1 and page 2). Seems like others are left out.
Attachments (1)
Change History (34)
#1
follow-up:
↓ 2
@
7 years ago
- Component changed from General to Import
- Keywords reporter-feedback added
#2
in reply to:
↑ 1
@
7 years ago
Replying to pbiron:
Welcome to trac!
Can you please let us know which version of the
WordPress Importerplugin you are using?
Version 0.6.4
#3
follow-up:
↓ 4
@
7 years ago
Thank you for the version # and for attaching a sample export to test with.
I'm not seeing the behavior you report when I import post export demo.xml into a fresh install (no plugins other than WordPress Importer v 0.6.4 and twentyseventeen theme).
Also, post export demo.xml only has 149 posts (confirmed by evaluating the xpath expression //input on it, as well as the number of posts that get imported in my test site) and not the 200 you mention in this ticket's description. Could that be why "Seems like others are left out"?
#4
in reply to:
↑ 3
@
7 years ago
Sorry, meant 149 posts.
See the vid here: https://youtu.be/IK84U3ERCi4
#5
@
7 years ago
- Component changed from Import to Query
- Keywords 2nd-opinion added
Thanx for the video! Now that I understand what you are reporting, I can confirm that I'm seeing what you are seeing. At first sight this appears to be a bug in the importer, but it is not. The explanation is a little complicated, but I'll do my best.
First, you can confirm that "Post 129" was only imported once by searching for "Post 129" and seeing that it only appears once in the search results.
The reason it appears on page 1 & page 2 is the result of 2 factors:
- more than one post in the WXR file being imported have the same value in the
<wp:post_date>element, thus more than one of the imported posts have the same value in thepost_datefield. - the pagination on
/wp-admin/edit.phpis accomplished viaLIMITandORDER BYclauses in the SQL request. MySQL performs certain optimizations on queries that include aLIMITand those optimizations are what is producing the seemingly buggy behavior you're seeing.
As explained in MySQL's LIMIT Query Optimization,
If you combine LIMIT row_count with ORDER BY, MySQL stops sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result. If ordering is done by using an index, this is very fast. If a filesort must be done, all rows that match the query without the LIMIT clause are selected, and most or all of them are sorted, before the first row_count are found. After the initial rows have been found, MySQL does not sort any remainder of the result set.
One manifestation of this behavior is that an ORDER BY query with and without LIMIT may return rows in different order, as described later in this section.
and more importantly
If multiple rows have identical values in the ORDER BY columns, the server is free to return those rows in any order, and may do so differently depending on the overall execution plan. In other words, the sort order of those rows is nondeterministic with respect to the nonordered columns.
So, with the posts imported from post export demo.xml, the 1st page of posts when Number of items per page = 20 (in Screen Options) contains some posts with post_date = 2018-06-11 15:51:54 and some with post_date = 2018-06-11 15:51:53 (the later including "Post 129") with the order of posts within each "group" being "random". The 2nd page contains some posts with post_date = 2018-06-11 15:51:53 (including "Post 129") and some with post_date = 2018-06-11 15:51:52.
If you change Number of items per page = 30 all of the posts with post_date = 2018-06-11 15:51:53 will appear on the 1st page (again in a "random" order) and thus "Post 129" will only appear on that 1st page.
Alternatively, if the posts in the site you exported (to produce post export demo.xml) all had unique post_date's, then when you import them into another site you will not see this behavior.
Yes, this is behavior is confusing, but it is a result of MySQL's "Limit Query Optimization" and is not a bug in the Importer nor in WP_Query.
Thus, I believe this ticket should be closed as "invalid", but I will leave it to one of the SQL gurus on the team to confirm my analysis and close it accordingly.
This ticket was mentioned in Slack in #core by pbiron. View the logs.
7 years ago
#7
@
7 years ago
Thank you very much for the broad explanation.
Is there any way to 'fix' this seemingly buggy behavior or a workaround, since it also has its effect on the frontend, when browsing the posts per category for instance. This is not something I would like to present to my visitors.
I can however comfirm that this problem won't occur when setting different dates.
Thanks again.
#8
@
7 years ago
Isn't non-determinisum fun :-)
Is there any way to 'fix' this seemingly buggy behavior
As trac is not the place for support questions, I'd suggest you ask that that question on https://wordpress.stackexchange.com/ (feel free to reference this ticket when you do).
#9
@
7 years ago
Ok, 1 more question to understand the MySQL's LIMIT Query Optimization;
It seems like posts are showing up multiple times, because of the posts are randomly ordered when browsing the pagination, but they don't actually show up multiple times?
Thanks
#10
@
7 years ago
In pure-SQL terms, any query of the form:
SELECT column FROM table ORDER BY order_by_column LIMIT x,y
where
order_by_columnis indexed, and- there are more than
yrows with the same value inorder_by_columnthat span the "pages"x-1andx
could result in some rows from "page" x-1 also appearing on "page" x.
In WP_Query terms, the above translates into any query of the form:
<?php $args = array( // the following are the indexed columns in the wp_posts table 'orderby' => any of 'type', 'date', 'id', 'parent', 'author' or 'name' // each of the below "pagination" args results in a LIMIT clause 'posts_per_page|showposts|posts_per_archive_page' => any value other than -1 ); $query = new WP_Query( $args );// including get_posts( $args )
There are other WP_Query corner cases that will exhibit the non-deterministic behavior as well, including but probably not limited to, hooking into any of posts_orderby, posts_clauses, posts_orderby_request, posts_clauses_request, posts_request to add any of the above (plus wp_posts.post_status, which is not directly legal in the args to new WP_Query()) to the "ORDER BY" clause of the SQL query that WP_Query ultimately executes.
#13
@
6 years ago
Another vote for adding a second default "order by" for example the post ID.
My problem: we migrated data from an old osCommerce store over into WP. During the migration the timestamp for all media files defaulted to "1970-01-01 00:00:00". This meant we had over a thousand attachments with the same timestamp.
Result: In the media library, it was impossible to scroll through the images as the pages kept showing the same content.
Our solution: we looped through the posts & added 1 second to the post_date and post_date_gmt to make it unique. The media library can now be navigated with prev / next without any issues.
Preferred solution: The assumption that post_date is unique is flawed as data may have been imported externally. To avoid issues with paginated scrolling the default "order by" should be supplemented by "id" which is always unique.
#14
@
6 years ago
If it helps, here's a little script I wrote to increment all of the post times by a second to "fix" the issue:
add_action(
'init',
function() {
if ( isset( $_GET['dbx_increment_dates'] ) ) {
$query = new WP_Query(
[
'post_type' => $_GET['dbx_increment_dates'],
'posts_per_page' => -1,
'post_status' => 'publish',
]
);
$seconds = 1;
foreach( $query->posts as $post ) {
// Time is already non-standard, so we can assume it's fine.
if ( false === stripos( $post->post_date, ' 00:00:00' ) ) {
continue;
}
$new_date = gmdate( 'Y-m-d H:i:s', strtotime( $post->post_date ) + $seconds );
wp_update_post(
[
'ID' => $post->ID,
'post_date' => $new_date,
'post_date_gmt' => $new_date,
]
);
$seconds++;
}
echo 'process complete';
exit;
}
}
);
#15
@
6 years ago
- Summary changed from Posts show up multiple times in backend, when imported with Import WordPress tool to Posts show up multiple times in backend when they have the exact same date
#16
@
6 years ago
I've seen this indeterminate sort order bug appear in several places in core when working on applications that ingest or generate a large amount data in bulk, therefore resulting in multiple posts with the exact same date.
The correct solution is to add secondary field to the orderby clause, as suggested above, eg post_date, ID, but this would need careful consideration.
#20
follow-up:
↓ 21
@
5 years ago
I saw this also in the Theme Unit Test data, when I had duplicated some posts, they ended up with the exact same time (which could happen on a multi-author blog). But the output was always different between the list of latest posts and the links to Next/Prev once you were on a single post. So make sure the fix is not just for pagination, but also that the navigation links on single posts get the same order that the archive page has.
Would there be a case when the orderby should not have ID?
#21
in reply to:
↑ 20
@
5 years ago
Replying to joyously:
Would there be a case when the orderby should not have
ID?
I don't believe so, no. This change does have potential to break custom filters that are doing string replacements on the resulting SQL though.
This ticket was mentioned in Slack in #core by pbiron. View the logs.
5 years ago
#23
@
5 years ago
also see https://core.trac.wordpress.org/ticket/47642#comment:16
thanks @helen
This ticket was mentioned in Slack in #core by helen. View the logs.
5 years ago
#26
@
5 years ago
Patch added in https://core.trac.wordpress.org/ticket/47642#comment:20
This ticket was mentioned in Slack in #core-restapi by timothybjacobs. View the logs.
5 years ago
This ticket was mentioned in Slack in #core by peterwilsoncc. View the logs.
4 years ago
#32
@
6 weeks ago
This bug came up when I was trying to filter attachment posts by media_type
https://core.trac.wordpress.org/ticket/64042
My workaround was, when multiple attachments have the same post_date, to add ID as secondary sort to guarantee consistent ordering.
$query_args['orderby'] = array( 'date' => $query_args['order'] ?? 'DESC', 'ID' => 'DESC', );
#33
@
4 weeks ago
I've taken stab at fixing this over in https://github.com/WordPress/wordpress-develop/pull/10262
It's the first attempt, and probably the worst attempt, but it builds on ideas already tested by @ramon-fincken and @pbearne in https://core.trac.wordpress.org/ticket/47642
My main objective is to re-ignite this bug as it's very obvious when paginating in DataViews.
For example, bulk uploading media will give that media (usually) the same post_date. Paginating across pages in the media library, you will see the same attachment id on multiple pages.
Plus all the other examples mentioned in related tickets.
Welcome to trac!
Can you please let us know which version of the
WordPress Importerplugin you are using?