WordPress.org

Make WordPress Core

Opened 6 years ago

Last modified 6 weeks ago

#6369 new defect (bug)

Blogger importer inefficient handling of data

Reported by: barry Owned by:
Milestone: WordPress.org Priority: normal
Severity: normal Version:
Component: Import Keywords:
Focuses: Cc:

Description

If the import dataset is large, the Blogger importer can store huge amounts of data in the blogger_importer option. It then updates this data over and over throughout the import. If MySQL logging (binary or query) is enabled, this can result in a large amount of data being written to disk, potentially filling up the partition rather quickly. On WordPress.com, I have seen an import write 100MB of binary logs every 2 min. Andy's suggestion is that we split up the data from the import rather than store it in one option. This would allow us to manipulate it more granularly and prevent the huge updates from happening.

Change History (11)

comment:1 ryan6 years ago

  • Milestone changed from 2.7 to 2.8

comment:2 ryan5 years ago

  • Component changed from General to Import
  • Owner anonymous deleted

comment:3 ryan5 years ago

  • Milestone changed from 2.8 to Future Release

comment:4 Denis-de-Bernardy5 years ago

  • Milestone changed from Future Release to 2.9

comment:5 ryan4 years ago

  • Milestone changed from 2.9 to Future Release

comment:6 SergeyBiryukov3 years ago

  • Milestone changed from Future Release to WordPress.org

comment:7 Otto4222 months ago

I do not believe this to be an issue anymore, as the blogger importer is only storing current status information (basically, list of blogs, position of import, etc) in the blogger_importer option as of 0.5 (probably as of 0.4 too).

The data imported from the blog itself is not stored in the option, although the URL of each post imported *is* stored there temporarily. It would be possible to change this to be stored as post_meta instead (probably already is, actually), and to be referenced there in order to avoid duplicates, at the cost of extra SQL queries.

Can a large import be done to get an idea of what the current damage level on the importer is?

comment:8 Workshopshed13 months ago

Users have reported issues with large blog imports e.g. 1000+ posts, 5000+ comments in that it stops or slows down this could be a problem with memory starvation and/or performance of the database.

See http://core.trac.wordpress.org/ticket/4010#comment:27

Yes, the importer is storing keys (partial URL) for each post and comment.
The key is something like this '/feeds/417730729915399755/posts/default/8397846992898424746'

As Otto mentions, it should be possible to change this from being stored in an array and add meta data.

The posts already add a meta data entry "blogger_permalink", this would also need to be added to the comments to support nesting.

Given that the load of the comments is sequential by post rather than random then having them look up the post ID via the DB should not add a significant overhead and the performance advantage of not storing the comments and posts arrays in the option may compensate for this.

The CommentEntry class in comment-entry would need to be updated to include the meta data.

The import_comments and import_posts functions in blogger-importer.php would need to be updated to remove the use of the arrays.

The set authors form also needed changing as it was referencing the post array

Last edited 12 months ago by Workshopshed (previous) (diff)

comment:9 Workshopshed12 months ago

I've stripped out the storing of post and comment keys in an array, for my small and medium blogs it does not seem to affect the performance adversely or make a massive improvement.

http://core.trac.wordpress.org/attachment/ticket/4010/blogger-importer.zip

Would be interested to know if it works better with very large blogs.

comment:10 Workshopshed12 months ago

One report of success, obviously it would be good to have a few more to confirm that large blogs now process ok

http://core.trac.wordpress.org/ticket/4010#comment:39

comment:11 Workshopshed6 weeks ago

There's one report of a huge database following the import which I don't quite understand

http://wordpress.org/support/topic/huge-database-2?replies=2

The options should be a lot smaller now, there's just a lot of meta data being stored but in comparison to the actual posts this should really be quite small. See http://wordpress.org/plugins/blogger-importer/screenshots/ for details.

Note: See TracTickets for help on using tickets.