Make WordPress Core

Opened 13 years ago

Closed 10 years ago

Last modified 10 years ago

#19225 closed enhancement (worksforme)

WordPress-Importer : Perform less database queries when backfilling attachment urls

Reported by: tott's profile tott Owned by: duck_'s profile duck_
Milestone: Priority: normal
Severity: normal Version:
Component: Import Keywords:
Focuses: Cc:


The current backfill_attachment_urls() approach causes a REPLACE query for each attachment url. This can cause the invalidation of a lot of data and also cause MySQL replication lag in some cases.

The attached patch will make sure to bundle these requests and perform only maximum of one db update per post.

Attachments (3)

better-attachment-url-backfilling.diff (1.5 KB) - added by tott 13 years ago.
better-attachment-url-backfilling.2.diff (1.6 KB) - added by tott 13 years ago.
better-attachment-url-backfilling.3.diff (1.7 KB) - added by tott 13 years ago.

Download all attachments as: .zip

Change History (12)

#1 @westi
13 years ago

  • Owner set to duck_
  • Status changed from new to assigned

#2 @SergeyBiryukov
13 years ago

  • Milestone changed from Awaiting Review to

#3 @duck_
13 years ago

The idea sounds like a good one. However, I notice right away that the current patch removes updating of enclosure meta.

#4 @duck_
13 years ago

A couple of other thoughts:

  • $from_url should be run through like_escape() before being put in the query
  • Can we retrieve post_content in the first query rather than using get_post() as well? Would this be beneficial?
  • $wpdb::update() can be used instead of using the query() method

#5 @tott
13 years ago

All valid points. I changed this accordingly. I left the replace call for the postmeta data as this will likely not cause any issues as the amount of data that's being invalidated is rather low.

#6 @duck_
13 years ago

better-attachment-url-backfilling.2.diff includes:

count( $post_ids ) > 0

but $post_ids isn't being declared in this version of the patch. Should be $posts?

Could you explain what you mean by "invalidation of a lot of data" and "data that's being invalidated"? Just want to make sure I fully understand the why :)

Another minor thing, but I think the uksort() before the first loop is unnecessary now.

#7 @tott
13 years ago

3 time's a charm - Note to self: don't code when ill and in a hurry to go to bed.

I implemented the suggestions. Let me explain a bit more about the background of this and why this is currently sub-optimal.

When you run a replicated MySQL setup with for example a Master and a Slave server you need to transfer each changes from the Master to the Slave after a change happened. Now imagine a replace query that would change something in the post content for each url.

With each replaced url this would need to find all the matching posts, read them, change them and write them back to the disk. So not only that you would have something that could rewrite the content of multiple posts at the same time you also have a scenario where you could write the same post a couple of times depending on how many urls are in it.

That's why I changed this to a per-post approach. First figure out which urls are in the post and then just update the post once. In this way the worst thing that can happen is that you send one update for each post. This should have a much lower disk IO impact than the original approach.

Last edited 13 years ago by tott (previous) (diff)

#8 @chriscct7
10 years ago

  • Keywords has-patch needs-testing removed
  • Resolution set to worksforme
  • Status changed from assigned to closed

The importer was removed from core, and since then at some point REPLACE was replaced. Closing as works for me

#9 @johnbillion
10 years ago

  • Milestone deleted
Note: See TracTickets for help on using tickets.