Make WordPress Core

Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#32264 closed enhancement (wontfix)

wp_enqueue_media() is slow on large sites

Reported by: philipjohn's profile philipjohn Owned by: rhurling's profile rhurling
Milestone: Priority: normal
Severity: normal Version: 3.9
Component: Posts, Post Types Keywords:
Focuses: performance Cc:

Description

Following on from #27985:

Despite the improvements in #27985, it looks like wp_enqueue_media() is still slow on large sites.

On WordPress.com VIP we're seeing multiple sites reporting very slow post saving. I've identified that the queries in [28194] are taking longer than 3,500ms on saving posts.

Here's debug bar output:

SELECT ID
FROM wp_x_posts
WHERE post_type = 'attachment'
AND post_mime_type LIKE 'audio%'
LIMIT 1
include, wp_enqueue_media, wpdb->get_var, hyperdb->query #17 (3,599.6ms @ 6032.12ms)

and EXPLAIN on that query:

$wpdb->get_row("EXPLAIN SELECT ID FROM wp_x_posts WHERE post_type = 'attachment' AND post_mime_type LIKE 'audio%' LIMIT 1", ARRAY_A);
array(10) {
  'id' =>
  string(1) "1"
  'select_type' =>
  string(6) "SIMPLE"
  'table' =>
  string(17) "wp_x_posts"
  'type' =>
  string(3) "ref"
  'possible_keys' =>
  string(16) "type_status_date"
  'key' =>
  string(16) "type_status_date"
  'key_len' =>
  string(2) "22"
  'ref' =>
  string(5) "const"
  'rows' =>
  string(6) "253750"
  'Extra' =>
  string(11) "Using where"
}

Attachments (9)

32264.patch (1.2 KB) - added by rhurling 9 years ago.
Initial patch per johnbillion's suggestion
32264.1.patch (2.0 KB) - added by rhurling 9 years ago.
Fix previous patch which ignored transient converting booleans
32264.2.patch (1.6 KB) - added by rhurling 9 years ago.
Remove test wp_enqueue_media from Hello Dolly Plugin
32264.3.patch (2.2 KB) - added by philipjohn 9 years ago.
32264.4.patch (2.9 KB) - added by rhurling 9 years ago.
New patch including deletion of transients in wp_insert_post
32264.5.patch (3.6 KB) - added by rhurling 9 years ago.
Add deletion of transients in wp_delete_attachment
32264.6.patch (3.7 KB) - added by philipjohn 9 years ago.
Addresses variable assignment feedback
32264.7.patch (5.2 KB) - added by philipjohn 9 years ago.
32264.8.patch (6.1 KB) - added by philipjohn 9 years ago.
Adds media_month query caching

Download all attachments as: .zip

Change History (29)

#1 @mintindeed
9 years ago

@wonderboymusic helped do some debugging on this issue. The problem is on a site with 1.7 million attachments and 0 audio or video attachments, this query has to do a ginormous table scan using a query that is not able to take advantage of existing indexes. His suggestion was to add an index to post_mime_type which reduces this query from taking several seconds, down to milliseconds.

#2 @johnbillion
9 years ago

  • Keywords needs-patch good-first-bug added
  • Type changed from defect (bug) to enhancement

I think as per the discussion on #27985, we should cache the result of each of these queries in a transient in much the same way we do for is_multi_author().

@rhurling
9 years ago

Initial patch per johnbillion's suggestion

#3 @rhurling
9 years ago

  • Keywords has-patch added; needs-patch removed

I'm thinking about adding another transient for the query that was added in [29271], but I'm not so sure about that.
Furthermore I'm not sure if I should prefix the transient name or not.

#4 @philipjohn
9 years ago

@rhuruling: Has that made a difference to the queries in your testing? I tried a similar change but the value of the transient was false, causing the query to run every time still.

@rhurling
9 years ago

Fix previous patch which ignored transient converting booleans

#5 @rhurling
9 years ago

Wait... the patch before worked...
Well, now both should work and it's more preference. And now I'm not sure why i rewrote it a bit.

@philipjohn: In my testing both patches prevent the query from being run more than once, since the transient value is an empty string for me, because I think boolean's get converted.
Did you test it with an object cache? Maybe that's interfering, because the boolean isn't converted to a string or something like that (which would mean that only the 2nd patch would work and I didn't write it for nothing).

Version 0, edited 9 years ago by rhurling (next)

@rhurling
9 years ago

Remove test wp_enqueue_media from Hello Dolly Plugin

#6 @philipjohn
9 years ago

Did you test it with an object cache?

Ah yes I did. I will test this patch again today and report back.

#7 @philipjohn
9 years ago

WFM! I'm adding a modified patch that also caches the months query, which is also slow on large sites.

@philipjohn
9 years ago

#8 @johnbillion
9 years ago

  • Keywords needs-patch added; has-patch removed

None of the above patches work because they use a persistent transient which is never flushed. When an audio/video file is uploaded, the transient will contain a stale value.

These two transients should be flushed whenever an audio/video attachment is uploaded or deleted.

Same goes for the month query which should probably be flushed whenever any attachment is uploaded or deleted.

Last edited 9 years ago by johnbillion (previous) (diff)

@rhurling
9 years ago

New patch including deletion of transients in wp_insert_post

#9 @rhurling
9 years ago

  • Keywords has-patch added; needs-patch removed

I created a new patch per @johnbillion's suggestion, which deletes the transient in wp_insert_post.

Edit: I forgot to delete the transients on attachment deletion, so I added that to the patch.

Last edited 9 years ago by rhurling (previous) (diff)

@rhurling
9 years ago

Add deletion of transients in wp_delete_attachment

#10 @johnbillion
9 years ago

  • Keywords needs-testing added

#11 @wonderboymusic
9 years ago

Related: #31071 - transients here (for post_mime_type) just mask the fact that the schema doesn't scale in these circumstances

Also, we shouldn't do false === ( $var = whatever() ). Variable assignment shouldn't happen in the middle of checking conditions.

@philipjohn
9 years ago

Addresses variable assignment feedback

#12 follow-up: @westi
9 years ago

Is there a strong reason to use transients for this instead of just having a object cached result that is keyed on something that includes the last_changed value for posts?

I guess the only reason would be to try and leverage this for sites without a persistent cache backend, I'm not sure I'd want to run without one for this many posts however ;)

#13 @obenland
9 years ago

  • Owner set to rhurling
  • Status changed from new to assigned

#14 in reply to: ↑ 12 @mintindeed
9 years ago

Replying to westi:

Is there a strong reason to use transients for this instead of just having a object cached result that is keyed on something that includes the last_changed value for posts?

I guess the only reason would be to try and leverage this for sites without a persistent cache backend, I'm not sure I'd want to run without one for this many posts however ;)

My biggest fear with using the caching backend is that we'd be putting something into the system that's not clear or visible. A landmine; magic functionality that takes place if your infrastructure is set up in a certain way. Unless you also add in an admin notice that says "hey, your site is pretty big, it's going to run slow unless you implement an object cache", then I think it be implemented in a way that any site can take advantage of.

A smaller reason is that dev and QA environments don't always have caching (on purpose) and that's where our dev team and product managers spend the greatest amount of their time. So we'd have to implement object caching, which would change the way our team works and the resources required for our dev environments, or deal with a really really slow dev environment. Neither one is the end of the world, but enough reason for me to agree with using a transient to store this info.

#15 @coreygilmore
9 years ago

Is there a reason to not add an index (as suggested in comment:1), in addition to caching the query? It's great to cache the result, but we're not fixing the core issue of having a slow query that is improved by (and needs) an index.

#16 @mboynes
9 years ago

On 32264.6.patch:

  1. There is some repeated code. I would recommend making a function to purge the media cache, which takes a post ID, and hooking into add_attachment and delete_attachment. This becomes a little more challenging when considering the next two notes...
  2. It's only necessary to delete the has_audio, has_video transients on add_attachment if the transient value is 0. For a site that adds a lot of audio or video, this can be a noticeable performance improvement, since that query would only have to be run the first time, and then only after an audio or video file is deleted.
  3. It's only necessary to delete the has_audio, has_video transients on delete_attachment if the transient value is 1.
  4. It's only necessary to delete the media_months transient on add_attachment if the month and year of the attachment are not already included in the transient. Furthermore, if they aren't included, we could just add them to the transient instead of deleting it and forcing that query to run again.

@philipjohn: I'm happy to make these changes if you don't have time.

In general, I'm in favor of these being transients over using the object cache. I don't see a reason to run these queries on every single pageload of the new/edit post screen, regardless of the site's size or object cache preference.

@philipjohn
9 years ago

#17 @philipjohn
9 years ago

32264.7.patch makes a start on that @mboynes

It introduces four new functions:

media_has_audio() is the new home for the transient setting and DB query
media_has_video() as above but for videos
media_months() as above but for the months query
wp_check_has_media() hooks into add_attachment and delete_attachment, checking the current values of the relevant transient and deleting it where neccessary and thus causing the calls at wp-includes/media.php:2921-2922 to re-create the transient only when needed.

The transient deletion in wp-includes/post.php has been removed.

I'm not confident the new functions and add_action() calls are really in the right place, so feedback welcome there.

The media_months transient still needs to be deleted. I'm thinking an additional hook for that, rather than rolling into wp_check_has_media()

@philipjohn
9 years ago

Adds media_month query caching

#18 @birgire
9 years ago

I wonder if we should introduce the has_audio and has_video counters (in the options table) instead of using these potential slow SQL queries at all?

These counters could be updated when we upload or delete audio/video files.

This could of course introduce potential sync problems, but then we could adjust the options value if needed.

Last edited 9 years ago by birgire (previous) (diff)

#19 @philipjohn
9 years ago

  • Keywords good-first-bug has-patch needs-testing removed
  • Resolution set to invalid
  • Status changed from assigned to closed

I'm going to close this in favour of #31071 after chatting to @wonderboymusic and @pento - we need to fix the queries really, which I've attempted to do in the latest patch on #31071. Adding caching here doesn't really solve the problem.

#20 @johnbillion
9 years ago

  • Milestone Awaiting Review deleted
  • Resolution changed from invalid to wontfix
Note: See TracTickets for help on using tickets.