Make WordPress Core

Opened 6 years ago

Last modified 2 years ago

#44349 new defect (bug)

Posts show up multiple times in backend when they have the exact same date

Reported by: wzshop's profile wzshop Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 4.9.6
Component: Query Keywords: reporter-feedback 2nd-opinion
Focuses: Cc:

Description

When I import for example 200 posts with the Import WordPress tool and then go to wp-admin>posts, the posts show up in the query multiple times (for instance on page 1 and page 2). Seems like others are left out.

Attachments (1)

post export demo.xml (194.4 KB) - added by wzshop 6 years ago.

Download all attachments as: .zip

Change History (32)

#1 follow-up: @pbiron
6 years ago

  • Component changed from General to Import
  • Keywords reporter-feedback added

Welcome to trac!

Can you please let us know which version of the WordPress Importer plugin you are using?

#2 in reply to: ↑ 1 @wzshop
6 years ago

Replying to pbiron:

Welcome to trac!

Can you please let us know which version of the WordPress Importer plugin you are using?

Version 0.6.4

#3 follow-up: @pbiron
6 years ago

Thank you for the version # and for attaching a sample export to test with.

I'm not seeing the behavior you report when I import post export demo.xml into a fresh install (no plugins other than WordPress Importer v 0.6.4 and twentyseventeen theme).

Also, post export demo.xml only has 149 posts (confirmed by evaluating the xpath expression //input on it, as well as the number of posts that get imported in my test site) and not the 200 you mention in this ticket's description. Could that be why "Seems like others are left out"?

#4 in reply to: ↑ 3 @wzshop
6 years ago

Sorry, meant 149 posts.
See the vid here: https://youtu.be/IK84U3ERCi4

#5 @pbiron
6 years ago

  • Component changed from Import to Query
  • Keywords 2nd-opinion added

Thanx for the video! Now that I understand what you are reporting, I can confirm that I'm seeing what you are seeing. At first sight this appears to be a bug in the importer, but it is not. The explanation is a little complicated, but I'll do my best.

First, you can confirm that "Post 129" was only imported once by searching for "Post 129" and seeing that it only appears once in the search results.

The reason it appears on page 1 & page 2 is the result of 2 factors:

  1. more than one post in the WXR file being imported have the same value in the <wp:post_date> element, thus more than one of the imported posts have the same value in the post_date field.
  2. the pagination on /wp-admin/edit.php is accomplished via LIMIT and ORDER BY clauses in the SQL request. MySQL performs certain optimizations on queries that include a LIMIT and those optimizations are what is producing the seemingly buggy behavior you're seeing.

As explained in MySQL's LIMIT Query Optimization,

If you combine LIMIT row_count with ORDER BY, MySQL stops sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result. If ordering is done by using an index, this is very fast. If a filesort must be done, all rows that match the query without the LIMIT clause are selected, and most or all of them are sorted, before the first row_count are found. After the initial rows have been found, MySQL does not sort any remainder of the result set.

One manifestation of this behavior is that an ORDER BY query with and without LIMIT may return rows in different order, as described later in this section.

and more importantly

If multiple rows have identical values in the ORDER BY columns, the server is free to return those rows in any order, and may do so differently depending on the overall execution plan. In other words, the sort order of those rows is nondeterministic with respect to the nonordered columns.

So, with the posts imported from post export demo.xml, the 1st page of posts when Number of items per page = 20 (in Screen Options) contains some posts with post_date = 2018-06-11 15:51:54 and some with post_date = 2018-06-11 15:51:53 (the later including "Post 129") with the order of posts within each "group" being "random". The 2nd page contains some posts with post_date = 2018-06-11 15:51:53 (including "Post 129") and some with post_date = 2018-06-11 15:51:52.

If you change Number of items per page = 30 all of the posts with post_date = 2018-06-11 15:51:53 will appear on the 1st page (again in a "random" order) and thus "Post 129" will only appear on that 1st page.

Alternatively, if the posts in the site you exported (to produce post export demo.xml) all had unique post_date's, then when you import them into another site you will not see this behavior.

Yes, this is behavior is confusing, but it is a result of MySQL's "Limit Query Optimization" and is not a bug in the Importer nor in WP_Query.

Thus, I believe this ticket should be closed as "invalid", but I will leave it to one of the SQL gurus on the team to confirm my analysis and close it accordingly.

This ticket was mentioned in Slack in #core by pbiron. View the logs.


6 years ago

#7 @wzshop
6 years ago

Thank you very much for the broad explanation.
Is there any way to 'fix' this seemingly buggy behavior or a workaround, since it also has its effect on the frontend, when browsing the posts per category for instance. This is not something I would like to present to my visitors.

I can however comfirm that this problem won't occur when setting different dates.
Thanks again.

#8 @pbiron
6 years ago

Isn't non-determinisum fun :-)

Is there any way to 'fix' this seemingly buggy behavior

As trac is not the place for support questions, I'd suggest you ask that that question on https://wordpress.stackexchange.com/ (feel free to reference this ticket when you do).

#9 @wzshop
6 years ago

Ok, 1 more question to understand the MySQL's LIMIT Query Optimization;
It seems like posts are showing up multiple times, because of the posts are randomly ordered when browsing the pagination, but they don't actually show up multiple times?

Thanks

#10 @pbiron
6 years ago

In pure-SQL terms, any query of the form:

SELECT column FROM table ORDER BY order_by_column LIMIT x,y

where

  1. order_by_columnis indexed, and
  2. there are more than y rows with the same value in order_by_column that span the "pages" x-1 and x

could result in some rows from "page" x-1 also appearing on "page" x.

In WP_Query terms, the above translates into any query of the form:

<?php
$args = array(
    // the following are the indexed columns in the wp_posts table
    'orderby' => any of 'type', 'date', 'id', 'parent', 'author' or 'name'
    // each of the below "pagination" args results in a LIMIT clause
    'posts_per_page|showposts|posts_per_archive_page' => any value other than -1
);
$query = new WP_Query( $args );// including get_posts( $args )


There are other WP_Query corner cases that will exhibit the non-deterministic behavior as well, including but probably not limited to, hooking into any of posts_orderby, posts_clauses, posts_orderby_request, posts_clauses_request, posts_request to add any of the above (plus wp_posts.post_status, which is not directly legal in the args to new WP_Query()) to the "ORDER BY" clause of the SQL query that WP_Query ultimately executes.

#11 @SergeyBiryukov
5 years ago

#46929 was marked as a duplicate.

#12 @SergeyBiryukov
5 years ago

#39818 was marked as a duplicate.

#13 @terraGirl
4 years ago

Another vote for adding a second default "order by" for example the post ID.

My problem: we migrated data from an old osCommerce store over into WP. During the migration the timestamp for all media files defaulted to "1970-01-01 00:00:00". This meant we had over a thousand attachments with the same timestamp.

Result: In the media library, it was impossible to scroll through the images as the pages kept showing the same content.

Our solution: we looped through the posts & added 1 second to the post_date and post_date_gmt to make it unique. The media library can now be navigated with prev / next without any issues.

Preferred solution: The assumption that post_date is unique is flawed as data may have been imported externally. To avoid issues with paginated scrolling the default "order by" should be supplemented by "id" which is always unique.

#14 @danielbachhuber
4 years ago

If it helps, here's a little script I wrote to increment all of the post times by a second to "fix" the issue:

add_action(
	'init',
	function() {
		if ( isset( $_GET['dbx_increment_dates'] ) ) {
			$query = new WP_Query(
				[
					'post_type'      => $_GET['dbx_increment_dates'],
					'posts_per_page' => -1,
					'post_status'    => 'publish',
				]
			);
			$seconds = 1;
			foreach( $query->posts as $post ) {
				// Time is already non-standard, so we can assume it's fine.
				if ( false === stripos( $post->post_date, ' 00:00:00' ) ) {
					continue;
				}
				$new_date = gmdate( 'Y-m-d H:i:s', strtotime( $post->post_date ) + $seconds );
				wp_update_post(
					[
						'ID'            => $post->ID,
						'post_date'     => $new_date,
						'post_date_gmt' => $new_date,
					]
				);
				$seconds++;
			}
			echo 'process complete';
			exit;
		}
	}
);

#15 @danielbachhuber
4 years ago

  • Summary changed from Posts show up multiple times in backend, when imported with Import WordPress tool to Posts show up multiple times in backend when they have the exact same date

#16 @johnbillion
4 years ago

I've seen this indeterminate sort order bug appear in several places in core when working on applications that ingest or generate a large amount data in bulk, therefore resulting in multiple posts with the exact same date.

The correct solution is to add secondary field to the orderby clause, as suggested above, eg post_date, ID, but this would need careful consideration.

#17 @TimothyBlynJacobs
4 years ago

I think this is related to #46294 as well?

#18 @johnbillion
4 years ago

Yep looks like it's the same underlying issue.

#19 @ocean90
4 years ago

#40376 was marked as a duplicate.

#20 follow-up: @joyously
4 years ago

I saw this also in the Theme Unit Test data, when I had duplicated some posts, they ended up with the exact same time (which could happen on a multi-author blog). But the output was always different between the list of latest posts and the links to Next/Prev once you were on a single post. So make sure the fix is not just for pagination, but also that the navigation links on single posts get the same order that the archive page has.
Would there be a case when the orderby should not have ID?

#21 in reply to: ↑ 20 @johnbillion
4 years ago

Replying to joyously:

Would there be a case when the orderby should not have ID?

I don't believe so, no. This change does have potential to break custom filters that are doing string replacements on the resulting SQL though.

This ticket was mentioned in Slack in #core by pbiron. View the logs.


3 years ago

This ticket was mentioned in Slack in #core by helen. View the logs.


3 years ago

#25 @hellofromTonya
3 years ago

Per core scrub today, this ticket is related to #47642 and #42936. Helen proposed:

Should probably lump those together and see if somebody is interested in targeting a 5.7-early fix

#27 @SergeyBiryukov
3 years ago

#52907 was marked as a duplicate.

This ticket was mentioned in Slack in #core-restapi by timothybjacobs. View the logs.


3 years ago

#29 @pbiron
3 years ago

#53201 was marked as a duplicate.

This ticket was mentioned in Slack in #core by peterwilsoncc. View the logs.


3 years ago

#31 @ramon fincken
2 years ago

Added new patch in #47642

Note: See TracTickets for help on using tickets.