Imports produce duplicate posts in some cases
|Reported by:||tott||Owned by:||tott|
|Component:||Import||Keywords:||import, duplicate, needs-testing, has-patch|
- When importing multiple times posts with post w/ out a title will create duplicates if import is re run.
- comments by author's with apostrophe, like O'Tool will result in duplicate comments
From what I see this is due to different sanitation rules in post/comment_exists functions than used in import (wp_insert_post / wp_insert_comment ). This was reported for the Movable Type importer and as far as I can see this probem also exists in other importers and I was able to reproduce it with the wordpress import as well.
post_exists needs to be verified with sanitize_post_field(). a simple stripslashes as it is right now in post_exists will not bring the correct result for cases with escaped data.
comment_exists seems only used within importers and there all the comment_authors are pre-escaped before passed to comment_exists. Running them through wpdb->prepare causes comment_exists to fail in cases with escaped data.
There are two possible ways to fix this problem :
- Fix *_exists functionality to produce correct matching, which might cause trouble on other places and needs to be tested really well
- Fix the import functions in a way that the sanitation/conversion is done to the values that are passed to the *_exists functions.
I included a patch that applies on the post_exists and comment_exists functions so no further changes in importers and other functions should be needed.
The patch also makes sure that titles/content needs to be unique per date and not within the whole blog. Also so far it was only checked for one of the submitted values content/title in combination with the date. Combinations of title/content where not handled correctly.
I tested this with a small wxr export which I attach to this ticket. Also did some manual testing but it will need some tests against other importers as well.