Make WordPress Core

Opened 17 years ago

Closed 9 years ago

#6369 closed defect (bug) (invalid)

Blogger importer inefficient handling of data

Reported by: barry's profile barry Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: Import Keywords:
Focuses: Cc:

Description

If the import dataset is large, the Blogger importer can store huge amounts of data in the blogger_importer option. It then updates this data over and over throughout the import. If MySQL logging (binary or query) is enabled, this can result in a large amount of data being written to disk, potentially filling up the partition rather quickly. On WordPress.com, I have seen an import write 100MB of binary logs every 2 min. Andy's suggestion is that we split up the data from the import rather than store it in one option. This would allow us to manipulate it more granularly and prevent the huge updates from happening.

Change History (12)

#1 @ryan
16 years ago

  • Milestone changed from 2.7 to 2.8

#2 @ryan
15 years ago

  • Component changed from General to Import
  • Owner anonymous deleted

#3 @ryan
15 years ago

  • Milestone changed from 2.8 to Future Release

#4 @Denis-de-Bernardy
15 years ago

  • Milestone changed from Future Release to 2.9

#5 @ryan
15 years ago

  • Milestone changed from 2.9 to Future Release

#6 @SergeyBiryukov
13 years ago

  • Milestone changed from Future Release to WordPress.org

#7 @Otto42
12 years ago

I do not believe this to be an issue anymore, as the blogger importer is only storing current status information (basically, list of blogs, position of import, etc) in the blogger_importer option as of 0.5 (probably as of 0.4 too).

The data imported from the blog itself is not stored in the option, although the URL of each post imported *is* stored there temporarily. It would be possible to change this to be stored as post_meta instead (probably already is, actually), and to be referenced there in order to avoid duplicates, at the cost of extra SQL queries.

Can a large import be done to get an idea of what the current damage level on the importer is?

#8 @Workshopshed
12 years ago

Users have reported issues with large blog imports e.g. 1000+ posts, 5000+ comments in that it stops or slows down this could be a problem with memory starvation and/or performance of the database.

See http://core.trac.wordpress.org/ticket/4010#comment:27

Yes, the importer is storing keys (partial URL) for each post and comment.
The key is something like this '/feeds/417730729915399755/posts/default/8397846992898424746'

As Otto mentions, it should be possible to change this from being stored in an array and add meta data.

The posts already add a meta data entry "blogger_permalink", this would also need to be added to the comments to support nesting.

Given that the load of the comments is sequential by post rather than random then having them look up the post ID via the DB should not add a significant overhead and the performance advantage of not storing the comments and posts arrays in the option may compensate for this.

The CommentEntry class in comment-entry would need to be updated to include the meta data.

The import_comments and import_posts functions in blogger-importer.php would need to be updated to remove the use of the arrays.

The set authors form also needed changing as it was referencing the post array

Last edited 12 years ago by Workshopshed (previous) (diff)

#9 @Workshopshed
12 years ago

I've stripped out the storing of post and comment keys in an array, for my small and medium blogs it does not seem to affect the performance adversely or make a massive improvement.

http://core.trac.wordpress.org/attachment/ticket/4010/blogger-importer.zip

Would be interested to know if it works better with very large blogs.

#10 @Workshopshed
12 years ago

One report of success, obviously it would be good to have a few more to confirm that large blogs now process ok

http://core.trac.wordpress.org/ticket/4010#comment:39

#11 @Workshopshed
11 years ago

There's one report of a huge database following the import which I don't quite understand

http://wordpress.org/support/topic/huge-database-2?replies=2

The options should be a lot smaller now, there's just a lot of meta data being stored but in comparison to the actual posts this should really be quite small. See http://wordpress.org/plugins/blogger-importer/screenshots/ for details.

#12 @Otto42
9 years ago

  • Milestone WordPress.org deleted
  • Resolution set to invalid
  • Status changed from new to closed

Large imports are still a problem regarding memory usage, but the importer no longer stores this information in this manner.

Note: See TracTickets for help on using tickets.