WordPress.org

Make WordPress Core

Opened 7 years ago

Closed 7 years ago

#8460 closed defect (bug) (fixed)

Imports produce duplicate posts in some cases

Reported by: tott Owned by: tott
Milestone: 2.8 Priority: normal
Severity: normal Version:
Component: Import Keywords: import, duplicate, needs-testing, has-patch
Focuses: Cc:

Description

  • When importing multiple times posts with post w/ out a title will create duplicates if import is re run.
  • comments by author's with apostrophe, like O'Tool will result in duplicate comments

From what I see this is due to different sanitation rules in post/comment_exists functions than used in import (wp_insert_post / wp_insert_comment ). This was reported for the Movable Type importer and as far as I can see this probem also exists in other importers and I was able to reproduce it with the wordpress import as well.

post_exists needs to be verified with sanitize_post_field(). a simple stripslashes as it is right now in post_exists will not bring the correct result for cases with escaped data.

comment_exists seems only used within importers and there all the comment_authors are pre-escaped before passed to comment_exists. Running them through wpdb->prepare causes comment_exists to fail in cases with escaped data.

There are two possible ways to fix this problem :

  • Fix *_exists functionality to produce correct matching, which might cause trouble on other places and needs to be tested really well
  • Fix the import functions in a way that the sanitation/conversion is done to the values that are passed to the *_exists functions.

I included a patch that applies on the post_exists and comment_exists functions so no further changes in importers and other functions should be needed.

The patch also makes sure that titles/content needs to be unique per date and not within the whole blog. Also so far it was only checked for one of the submitted values content/title in combination with the date. Combinations of title/content where not handled correctly.

I tested this with a small wxr export which I attach to this ticket. Also did some manual testing but it will need some tests against other importers as well.

Attachments (2)

duplicate-posts-comments-import.diff (2.7 KB) - added by tott 7 years ago.
patch for comment/post_exists functions against rev 10008
export-testing-duplicate-imports.xml (8.5 KB) - added by tott 7 years ago.
wordpress export file to reproduce/test the issue

Download all attachments as: .zip

Change History (5)

@tott7 years ago

patch for comment/post_exists functions against rev 10008

@tott7 years ago

wordpress export file to reproduce/test the issue

comment:1 @tott7 years ago

while working on this i ran into an other bug that makes the comment->post relation disappear for existing posts. I files this as #8458 and attached a patch for this ticket.

comment:2 @tott7 years ago

I was thinking to split sanitation and test logic into seperate tickets, but I am reconsidering this now as the current test logic within post_exists prevents empty titles and or contents to be checked correctly. The input values of the post_exists function should be a logical AND connection and not a logical and between content/title and post_date as it is right now.

In this patch I am making and AND connection between all non empty values that are passed on to post_exists.

It might make sense even to remove the test if a value is empty in order to get a correct result.

Please give me some feedback on how to proceed here. Putting only the sanitation fix in the post_exists function will not fix the ticket as empty titles/content will still not be recognized correctly.

comment:3 @ryan7 years ago

  • Resolution set to fixed
  • Status changed from new to closed

(In [10722]) post_exists() and comment_exists() fixes. Fixes post duplication during import. Props tott. fixes #8460

Note: See TracTickets for help on using tickets.