WordPress.org

Make WordPress Core

Opened 2 years ago

Closed 5 weeks ago

Last modified 4 weeks ago

#24498 closed enhancement (fixed)

Improving performance of meta_form()

Reported by: lumaraf Owned by: pento
Milestone: 4.3 Priority: normal
Severity: normal Version: 3.5
Component: Administration Keywords: has-patch
Focuses: performance Cc:

Description (last modified by SergeyBiryukov)

The query executed in meta_form() in wp-admin/includes/template.php took over 5 seconds to execute on my largest wordpress installation. I changed it to use a NOT BETWEEN to enable MySQL to execute the query by just using a small part of the meta_key index. My new query executes in less than 10 milliseconds.

Here is the improved query:

SELECT DISTINCT meta_key FROM wp_postmeta
WHERE meta_key NOT BETWEEN '_' AND '_z'
HAVING meta_key NOT LIKE '\_%'
ORDER BY meta_key

Attachments (9)

24498.patch (594 bytes) - added by swissspidy 8 weeks ago.
Optimize the query
24498-alternative.patch (2.5 KB) - added by swissspidy 8 weeks ago.
This alternative patch removes the query entirely and only allows entering meta keys through the text input
24498-test.php (2.8 KB) - added by pento 7 weeks ago.
myisam-100000.csv (2.4 KB) - added by pento 7 weeks ago.
myisam-1000000.csv (2.4 KB) - added by pento 7 weeks ago.
innodb-100000.csv (2.4 KB) - added by pento 7 weeks ago.
innodb-1000000.csv (2.4 KB) - added by pento 7 weeks ago.
24498rd2.1.patch (1.7 KB) - added by chriscct7 5 weeks ago.
24498rd2.2.patch (1.6 KB) - added by chriscct7 5 weeks ago.
if a post id is passed in not the object, try for the optimization as well

Download all attachments as: .zip

Change History (43)

comment:1 @SergeyBiryukov2 years ago

  • Description modified (diff)

comment:2 @nacin20 months ago

  • Component changed from Performance to Administration
  • Focuses performance added

comment:3 @johnbillion4 months ago

#32361 was marked as a duplicate.

comment:4 @chriscct74 months ago

  • Keywords needs-patch added
  • Version changed from 3.5.1 to 3.5

comment:5 @swissspidy8 weeks ago

I tested different queries on a local environment and also on some production systems with real data.

Although the original query was never really slow, I've come to the conclusion that the proposed query is at least 40% faster in general.

However, maybe we don't really need this query at all. It's not that useful. I suggest either removing the select box completely (in favor of the text input) or adding an ajax autocomplete to the text input.

Heck, we could even remove the whole meta box for new installs. Who uses that anyway?

@swissspidy8 weeks ago

Optimize the query

comment:6 @swissspidy8 weeks ago

  • Keywords has-patch added; needs-patch removed

Btw, this issue got a bit of traction thanks to a CSS-Tricks article yesterday: https://css-tricks.com/swapping-a-wordpress-core-meta-box-to-speed-up-editing/

comment:7 @slackbot8 weeks ago

This ticket was mentioned in Slack in #core by dd32. View the logs.

comment:8 @dd328 weeks ago

There's also #32449 which is a duplicate (one or the other)

comment:9 @swissspidy8 weeks ago

#32449 was marked as a duplicate.

@swissspidy8 weeks ago

This alternative patch removes the query entirely and only allows entering meta keys through the text input

comment:10 @pross7 weeks ago

Just to chime in with testing results on a site with 2.5M postmeta rows...

The default query takes 6.85s

This takes 1.58s:

SELECT DISTINCT meta_key 
FROM wp_postmeta
WHERE meta_key NOT LIKE '\\_%'
ORDER BY meta_key
limit 30

And the proposed patch 24498.patch takes 2.5s

comment:11 @rarylson7 weeks ago

Hi,

Recently, I sent an email to the Wordpress dev mailing list and Chris Christoff told me about this bug report.

So, I'm copy/pasting the my email here:

---

The problem was at line 674, where the SQL query was defined.

I'm putting the raw (already processed) SQL query here:

SELECT meta_key
    FROM wp_postmeta
    GROUP BY meta_key
    HAVING meta_key NOT LIKE '\_%'
    ORDER BY meta_key
    LIMIT 30

To run this query, MySQL will sort all meta_key lines, including all custom fields and all the other fields (starting with '_'), unify them, select the desired fields, and them return them.

The problem with its query is that it will run a sort over all of the entries (due to the GROUP BY part), and select the desired entries after the group by (HAVING part). A more efficient solution should consider excluding unnecessary entries before sorting/unifying them.

A better query could be:

SELECT DISTINCT meta_key
    FROM wp_postmeta
    WHERE meta_key NOT LIKE '\_%'
    ORDER BY meta_key
    LIMIT 30

In my case, this simply change gave us a performance improvement of 4x (from 8s to less then 2s).

So, I'm proposing this patch to this file, and I hope that it can be useful to make the Wordpress project better.

--- wp-admin/includes/template.php.orig    2015-07-16 00:22:28.000000000 -0300
+++ wp-admin/includes/template.php    2015-07-16 00:23:06.000000000 -0300
@@ -671,10 +671,9 @@
      * @param int $limit Number of custom fields to retrieve. Default 30.
      */
     $limit = apply_filters( 'postmeta_form_limit', 30 );
-    $sql = "SELECT meta_key
+    $sql = "SELECT DISTINCT meta_key
         FROM $wpdb->postmeta
-        GROUP BY meta_key
-        HAVING meta_key NOT LIKE %s
+        WHERE meta_key NOT LIKE %s
         ORDER BY meta_key
         LIMIT %d";
     $keys = $wpdb->get_col( $wpdb->prepare( $sql, $wpdb->esc_like( '_' ) . '%', $limit ) );

It's interesting to see the @pross (comment 10) proposed the same patch that I did :).

Last edited 7 weeks ago by rarylson (previous) (diff)

@pento7 weeks ago

@pento7 weeks ago

@pento7 weeks ago

@pento7 weeks ago

@pento7 weeks ago

comment:12 @pento7 weeks ago

  • Milestone changed from Awaiting Review to 4.3
  • Owner set to pento
  • Status changed from new to assigned

24498-test.php is a performance test I ran on these queries, the results are attached as CSVs.

For a quick explanation of the variables:

$connections - I originally planned on running these tests on multiple versions of MySQL, but decided that it wasn't really necessary, as these are fairly simple queries.
$engines - obvious.
$row_counts - how many rows to insert into the test table.
$percent_underscores - the percentage of rows to have starting with a _, which affects the comparison part of the query.
$unique_keys_count - the number of unique keys to insert into the table. This affects the index cardinality.

Summary: The original query is clearly slower, but there's very little difference between the two proposed queries, with the query in 24498.patch being slight quicker in most circumstances.

Unless anyone can show consistent data that one or the other is significantly quicker, I'm happy for 24498.patch to go into 4.3.

comment:13 @rarylson7 weeks ago

Hi @pento,

The real optimization is using the WHERE (exclusions before sorting) instead of only using HAVING (exclusions after sorting).

About the query proposed in the patch:

SELECT DISTINCT meta_key FROM wp_postmeta
WHERE meta_key NOT BETWEEN '_' AND '_z'
HAVING meta_key NOT LIKE '\_%'
ORDER BY meta_key
LIMIT 30

This patch runs an exclusion, after a sort, and after it runs an other exclusion.

The other one (comments 10 and 11) runs an exclusion and after a sort (there is no need of running another exclusion after the sort, because there is no HAVING clause).

SELECT DISTINCT meta_key FROM wp_postmeta
WHERE meta_key NOT LIKE '\_%'
ORDER BY meta_key
LIMIT 30

The execution plan is very similar in both cases. Both of them are practically equivalent in terms of efficient.

It's possible that excluding using a 'not in range' logic be a bit quicker that excluding using a 'not like' logic. But it will only make difference if there is no meta_key entry in the form _X, where X is a character greater than z. For example, '_Á' or '_ß'. In these cases, the second query would run in the same performance or even a little faster then the first one. More over, the NOT BETWEEN performance depends on the used CHARSET and its results depend on the COLLATION. It's possible that for some cases the first query will be better and for others the second query will be better.

So, in this specific case, I prefer the simpler solution (comments 10/11). But I remember that both of them are correct and much more efficient than the original one.

Final note:

The first query could be rewritten also as:

SELECT DISTINCT meta_key FROM wp_postmeta
WHERE meta_key NOT BETWEEN '_' AND '_z'
    AND meta_key NOT LIKE '\_%'
ORDER BY meta_key
LIMIT 30

This moves the final exclude (after the sort) to before the sort. Technically, it will be a little better.

It's also worth to notice that using the next query is wrong (because the '_Á' and '_ß' bug and the problem of changing the COLATION). So, the NOT LIKE clause (after or before the sort) is still required.

SELECT DISTINCT meta_key FROM wp_postmeta
WHERE meta_key NOT BETWEEN '_' AND '_z'
ORDER BY meta_key
LIMIT 30

comment:14 @dbwpe6 weeks ago

Heya folks,

I just wanted to chime in real quick to say that @pento's 24498.patch was a very significant improvement for one of our customers here with ~3.8 million rows in _postmeta (InnoDB).

The single affected query went from ~17 seconds to ~0.03 seconds.

Original query:

SELECT meta_key 
    FROM wp_postmeta 
    GROUP BY meta_key 
    HAVING meta_key NOT LIKE '\\_%' 
    ORDER BY meta_key 
    LIMIT 30;
30 rows in set (16.83 sec)

Initial improvement suggestion:

SELECT DISTINCT meta_key
    FROM wp_postmeta
    WHERE meta_key NOT LIKE '\_%'
    ORDER BY meta_key
    LIMIT 30;
30 rows in set (4.03 sec)

@pento version

SELECT DISTINCT meta_key FROM wp_postmeta
    WHERE meta_key NOT BETWEEN '_' AND '_z'
        AND meta_key NOT LIKE '\_%'
    ORDER BY meta_key
    LIMIT 30;
30 rows in set (0.03 sec)

It's worth noting that most of this site's _postmeta are from WooCommerce, which mostly begin with underscores, so excluding those more efficiently really has a big impact.

Last edited 6 weeks ago by dbwpe (previous) (diff)

comment:15 follow-up: @obenland6 weeks ago

@pento, can you make a decision for 4.3 on this before beta4?

comment:16 follow-up: @chriscct76 weeks ago

@pento, the "range" one on your test, how does that perform when the meta starts with or as suggested in comment:13?

comment:17 in reply to: ↑ 16 @dd326 weeks ago

Replying to chriscct7:

@pento, the "range" one on your test, how does that perform when the meta starts with or as suggested in comment:13?

I feel like this is a edgecase that we don't necessarily need to optimize for heavily. The further regexp/having clauses would filter out those, and they're certainly going to be less common.

The other option is that it's written as meta_key NOT BETWEEN '_' AND '`' which should filter out all which start with _ I think - but i'm not sure what performance implications that would have (` is the next character up from _)

comment:18 in reply to: ↑ 15 @pento6 weeks ago

Replying to obenland:

@pento, can you make a decision for 4.3 on this before beta4?

Given the wide variety of behaviour seen here for the different query options, which I couldn't reproduce in testing, we'll probably need to punt this one.

I'm currently busy with 4.2.3, so if someone feels like having a go at this, please do. Beta4 is due out on the 22nd - if we can have something solid by then, I'm happy to include it.

comment:19 @dd326 weeks ago

Given the wide variety of behaviour seen here for the different query options, which I couldn't reproduce in testing, we'll probably need to punt this one.

In your opinion @pento, do you expect either of the proposed queries here to be slower than what we currently have?
If it's faster in 80% of cases, we should get this in, and iterate/improve it in 4.4 (I'll happily do both of these)

comment:20 @pento6 weeks ago

Either will be faster than the original - as @rarylson noted in comment:11, the aim is to exclude the full scan caused by the GROUP BY in the original query. The worst case is that the original query and both of the proposed queries are about the same speed when the percentage of meta_keys beginning with _ is very low.

comment:21 @chriscct76 weeks ago

I concur with @dd32. Let's get one of the two in and then we can improve upon it next release.

comment:22 @slackbot6 weeks ago

This ticket was mentioned in Slack in #core by obenland. View the logs.

comment:23 @dd326 weeks ago

One main thing to keep in mind while testing this, is that (although you would think it wouldn't make that much of a difference) latin1 vs utf8 vs utf8mb4 seems to make a significant different here in performance, however the original being slow stands out under them all.

24498.patch seems to be safe to me, so lets go for it for 4.3.

If this still has performance issues for larger sites, I'd suggest that it should perhaps be cached in a transient (and cleared when an non-underscored field is set). This might be worth looking at in 4.4.

The other option is that it's written as meta_key NOT BETWEEN '_' AND '`' which should filter out all which start with _ I think - but i'm not sure what performance implications that would have (` is the next character up from _)

Just to loop back to this, the performance isn't great, MySQL doens't optimize it the same :)

comment:24 @dd326 weeks ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 33390:

Switch to a more performant query in meta_form().

Props lumaraf, swissspidy, rarylson, pento
Fixes #24498

comment:25 @jeichorn5 weeks ago

I have a site with 4.5 million wp_postmeta rows and this query is still slow (20 seconds).

It looks like the where clause is used but in my case that still means creating a 4.1 million row temp table because of the distinct. I can write a patch with a transient if that's the solution that will be accepted, but at least in my case the 20 second delay every time its empty for a feature that isn't useful isn't great (even if they used the meta_key drop down, showing 30 of the 2k keys isn't helpful).

Should I start a new ticket or reopen this one?

comment:26 @dd325 weeks ago

@jeichorn Lets start a new ticket :) "Cache the list of keys in meta_form()" or something.

Curious though, if this takes 20s, how long did the previous one take? Unfortunately there isn't really much we can do about the temporary table here I don't think.

comment:27 @chriscct75 weeks ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

There's a couple other low hanging optimizations it looks like we can do here.

First, I think we can get rid of the orderby, since we already run the result through natcasesort.

Also, we can limit the meta it searches through by limiting to the post_id in the where clause.

Attaching something I just wrote on a phone (so if you want to test it make sure to sanity check it for autocorrect gremlins before trying to apply it)

@chriscct75 weeks ago

@chriscct75 weeks ago

if a post id is passed in not the object, try for the optimization as well

comment:28 @chriscct75 weeks ago

  • Resolution set to fixed
  • Status changed from reopened to closed

Ah whoops, I'm thinking of the wrong form here. Both optimizations are invalid.
Stealing from DD32's feedback:
Limiting to $post’s post_id makes the function fast, but also useless, it’s designed for adding other meta to the current post that you’ve used elsewhere.. doing this just means that it’ll only show what you can already see. Ordering is done so that it gets the first 30(by default) results, ie. A, B, C, D instead of A, Z, G, R

comment:29 follow-up: @dd325 weeks ago

After chatting with @chriscct7 however, there is another angle which we could try here.

We could perform a query like this:

SELECT DISTINCT meta_key
FROM wp_postmeta pm
    LEFT JOIN wp_posts p ON pm.post_id = p.id
WHERE p.post_type IN ( 'post', 'page', 'current-post-type-of-the-post')
   AND meta_key NOT BETWEEN '_' AND '_z'
HAVING meta_key NOT LIKE '\_%'
ORDER BY meta_key DESC

I've not benchmarked it yet, but the idea is that plugins which add lots of meta_keys probably do it on "hidden" data post_types rather than the post_type that a user is actually editing. Since the user hasn't consciously added the meta keys to those posts, they shouldn't be shown here anyway.

If someone has a LOT of posts however (or only posts), this could potentially be slower under the conditions.

If anyone has a site that fits either of those extremes (thousands of meta_keys on non-post post_types - a large WooCommerce site for example I believe) or a site with many thousands of posts and can test that query out.. it'd be appreciated :)

Last edited 5 weeks ago by dd32 (previous) (diff)

comment:30 in reply to: ↑ 29 @jeichorn5 weeks ago

Replying to dd32:

After chatting with @chriscct7 however, there is another angle which we could try here.

We could perform a query like this:

SELECT DISTINCT meta_key
FROM wp_postmeta pm
    LEFT JOIN wp_posts p ON pm.post_id = p.id
WHERE p.post_type IN ( 'post', 'page', 'current-post-type-of-the-post')
   AND meta_key NOT BETWEEN '_' AND '_z'
HAVING meta_key NOT LIKE '\_%'
ORDER BY meta_key DESC

I've not benchmarked it yet, but the idea is that plugins which add lots of meta_keys probably do it on "hidden" data post_types rather than the post_type that a user is actually editing. Since the user hasn't consciously added the meta keys to those posts, they shouldn't be shown here anyway.

If someone has a LOT of posts however (or only posts), this could potentially be slower under the conditions.

If anyone has a site that fits either of those extremes (thousands of meta_keys on non-post post_types - a large WooCommerce site for example I believe) or a site with many thousands of posts and can test that query out.. it'd be appreciated :)

That last query fixes the problem until you try to add a post of big type (This site has a location database with a couple 100k posts). The problem seems pretty intractable once you do any operation that requires a temp table with a couple 100k rows performance collapses. This effect seems to be made worse by utf8mb4.

At the end of the day this is pretty frustrating. Fighting a performance problems for a feature that spends most of its time hidden from view. Maybe just a add a hook for manually setting the meta_key list, and not running the query in that case would be a good solution. Its still a landmine waiting to blow up people's sites, but at least people can fix it.

comment:31 @chriscct75 weeks ago

#18979 was marked as a duplicate.

comment:32 @chriscct75 weeks ago

#18979 which was closed as duplicate, has it's own patch on the idea of post type limitation

comment:33 @slackbot4 weeks ago

This ticket was mentioned in Slack in #core by antpb. View the logs.

comment:34 @kitchin4 weeks ago

Follow-up bug: #33296
As another follow-up, I re-opened #18979.

Note: See TracTickets for help on using tickets.