#19225 closed enhancement (worksforme)
WordPress-Importer : Perform less database queries when backfilling attachment urls
Reported by: | tott | Owned by: | duck_ |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | normal | Version: | |
Component: | Import | Keywords: | |
Focuses: | Cc: |
Description
The current backfill_attachment_urls() approach causes a REPLACE query for each attachment url. This can cause the invalidation of a lot of data and also cause MySQL replication lag in some cases.
The attached patch will make sure to bundle these requests and perform only maximum of one db update per post.
Attachments (3)
Change History (12)
#4
@
13 years ago
A couple of other thoughts:
- $from_url should be run through like_escape() before being put in the query
- Can we retrieve post_content in the first query rather than using get_post() as well? Would this be beneficial?
- $wpdb::update() can be used instead of using the query() method
#5
@
13 years ago
All valid points. I changed this accordingly. I left the replace call for the postmeta data as this will likely not cause any issues as the amount of data that's being invalidated is rather low.
#6
@
13 years ago
better-attachment-url-backfilling.2.diff includes:
count( $post_ids ) > 0
but $post_ids isn't being declared in this version of the patch. Should be $posts?
Could you explain what you mean by "invalidation of a lot of data" and "data that's being invalidated"? Just want to make sure I fully understand the why :)
Another minor thing, but I think the uksort() before the first loop is unnecessary now.
#7
@
13 years ago
3 time's a charm - Note to self: don't code when ill and in a hurry to go to bed.
I implemented the suggestions. Let me explain a bit more about the background of this and why this is currently sub-optimal.
When you run a replicated MySQL setup with for example a Master and a Slave server you need to transfer each changes from the Master to the Slave after a change happened. Now imagine a replace query that would change something in the post content for each url.
With each replaced url this would need to find all the matching posts, read them, change them and write them back to the disk. So not only that you would have something that could rewrite the content of multiple posts at the same time you also have a scenario where you could write the same post a couple of times depending on how many urls are in it.
That's why I changed this to a per-post approach. First figure out which urls are in the post and then just update the post once. In this way the worst thing that can happen is that you send one update for each post. This should have a much lower disk IO impact than the original approach.
The idea sounds like a good one. However, I notice right away that the current patch removes updating of enclosure meta.