WordPress.org

Make WordPress Core

Opened 3 years ago

Closed 9 months ago

Last modified 7 months ago

#24498 closed enhancement (fixed)

Improving performance of meta_form()

Reported by: lumaraf Owned by: pento
Milestone: 4.3 Priority: normal
Severity: normal Version: 3.5
Component: Administration Keywords: has-patch
Focuses: performance Cc:

Description (last modified by SergeyBiryukov)

The query executed in meta_form() in wp-admin/includes/template.php took over 5 seconds to execute on my largest wordpress installation. I changed it to use a NOT BETWEEN to enable MySQL to execute the query by just using a small part of the meta_key index. My new query executes in less than 10 milliseconds.

Here is the improved query:

SELECT DISTINCT meta_key FROM wp_postmeta
WHERE meta_key NOT BETWEEN '_' AND '_z'
HAVING meta_key NOT LIKE '\_%'
ORDER BY meta_key

Attachments (9)

24498.patch (594 bytes) - added by swissspidy 10 months ago.
Optimize the query
24498-alternative.patch (2.5 KB) - added by swissspidy 10 months ago.
This alternative patch removes the query entirely and only allows entering meta keys through the text input
24498-test.php (2.8 KB) - added by pento 10 months ago.
myisam-100000.csv (2.4 KB) - added by pento 10 months ago.
myisam-1000000.csv (2.4 KB) - added by pento 10 months ago.
innodb-100000.csv (2.4 KB) - added by pento 10 months ago.
innodb-1000000.csv (2.4 KB) - added by pento 10 months ago.
24498rd2.1.patch (1.7 KB) - added by chriscct7 9 months ago.
24498rd2.2.patch (1.6 KB) - added by chriscct7 9 months ago.
if a post id is passed in not the object, try for the optimization as well

Download all attachments as: .zip

Change History (50)

#1 @SergeyBiryukov
3 years ago

  • Description modified (diff)

#2 @nacin
2 years ago

  • Component changed from Performance to Administration
  • Focuses performance added

#3 @johnbillion
12 months ago

#32361 was marked as a duplicate.

#4 @chriscct7
12 months ago

  • Keywords needs-patch added
  • Version changed from 3.5.1 to 3.5

#5 @swissspidy
10 months ago

I tested different queries on a local environment and also on some production systems with real data.

Although the original query was never really slow, I've come to the conclusion that the proposed query is at least 40% faster in general.

However, maybe we don't really need this query at all. It's not that useful. I suggest either removing the select box completely (in favor of the text input) or adding an ajax autocomplete to the text input.

Heck, we could even remove the whole meta box for new installs. Who uses that anyway?

@swissspidy
10 months ago

Optimize the query

#6 @swissspidy
10 months ago

  • Keywords has-patch added; needs-patch removed

Btw, this issue got a bit of traction thanks to a CSS-Tricks article yesterday: https://css-tricks.com/swapping-a-wordpress-core-meta-box-to-speed-up-editing/

This ticket was mentioned in Slack in #core by dd32. View the logs.


10 months ago

#8 @dd32
10 months ago

There's also #32449 which is a duplicate (one or the other)

#9 @swissspidy
10 months ago

#32449 was marked as a duplicate.

@swissspidy
10 months ago

This alternative patch removes the query entirely and only allows entering meta keys through the text input

#10 @pross
10 months ago

Just to chime in with testing results on a site with 2.5M postmeta rows...

The default query takes 6.85s

This takes 1.58s:

SELECT DISTINCT meta_key 
FROM wp_postmeta
WHERE meta_key NOT LIKE '\\_%'
ORDER BY meta_key
limit 30

And the proposed patch 24498.patch takes 2.5s

#11 @rarylson
10 months ago

Hi,

Recently, I sent an email to the Wordpress dev mailing list and Chris Christoff told me about this bug report.

So, I'm copy/pasting the my email here:

---

The problem was at line 674, where the SQL query was defined.

I'm putting the raw (already processed) SQL query here:

SELECT meta_key
    FROM wp_postmeta
    GROUP BY meta_key
    HAVING meta_key NOT LIKE '\_%'
    ORDER BY meta_key
    LIMIT 30

To run this query, MySQL will sort all meta_key lines, including all custom fields and all the other fields (starting with '_'), unify them, select the desired fields, and them return them.

The problem with its query is that it will run a sort over all of the entries (due to the GROUP BY part), and select the desired entries after the group by (HAVING part). A more efficient solution should consider excluding unnecessary entries before sorting/unifying them.

A better query could be:

SELECT DISTINCT meta_key
    FROM wp_postmeta
    WHERE meta_key NOT LIKE '\_%'
    ORDER BY meta_key
    LIMIT 30

In my case, this simply change gave us a performance improvement of 4x (from 8s to less then 2s).

So, I'm proposing this patch to this file, and I hope that it can be useful to make the Wordpress project better.

--- wp-admin/includes/template.php.orig    2015-07-16 00:22:28.000000000 -0300
+++ wp-admin/includes/template.php    2015-07-16 00:23:06.000000000 -0300
@@ -671,10 +671,9 @@
      * @param int $limit Number of custom fields to retrieve. Default 30.
      */
     $limit = apply_filters( 'postmeta_form_limit', 30 );
-    $sql = "SELECT meta_key
+    $sql = "SELECT DISTINCT meta_key
         FROM $wpdb->postmeta
-        GROUP BY meta_key
-        HAVING meta_key NOT LIKE %s
+        WHERE meta_key NOT LIKE %s
         ORDER BY meta_key
         LIMIT %d";
     $keys = $wpdb->get_col( $wpdb->prepare( $sql, $wpdb->esc_like( '_' ) . '%', $limit ) );

It's interesting to see the @pross (comment 10) proposed the same patch that I did :).

Last edited 10 months ago by rarylson (previous) (diff)

@pento
10 months ago

@pento
10 months ago

@pento
10 months ago

#12 @pento
10 months ago

  • Milestone changed from Awaiting Review to 4.3
  • Owner set to pento
  • Status changed from new to assigned

24498-test.php is a performance test I ran on these queries, the results are attached as CSVs.

For a quick explanation of the variables:

$connections - I originally planned on running these tests on multiple versions of MySQL, but decided that it wasn't really necessary, as these are fairly simple queries.
$engines - obvious.
$row_counts - how many rows to insert into the test table.
$percent_underscores - the percentage of rows to have starting with a _, which affects the comparison part of the query.
$unique_keys_count - the number of unique keys to insert into the table. This affects the index cardinality.

Summary: The original query is clearly slower, but there's very little difference between the two proposed queries, with the query in 24498.patch being slight quicker in most circumstances.

Unless anyone can show consistent data that one or the other is significantly quicker, I'm happy for 24498.patch to go into 4.3.

#13 @rarylson
10 months ago

Hi @pento,

The real optimization is using the WHERE (exclusions before sorting) instead of only using HAVING (exclusions after sorting).

About the query proposed in the patch:

SELECT DISTINCT meta_key FROM wp_postmeta
WHERE meta_key NOT BETWEEN '_' AND '_z'
HAVING meta_key NOT LIKE '\_%'
ORDER BY meta_key
LIMIT 30

This patch runs an exclusion, after a sort, and after it runs an other exclusion.

The other one (comments 10 and 11) runs an exclusion and after a sort (there is no need of running another exclusion after the sort, because there is no HAVING clause).

SELECT DISTINCT meta_key FROM wp_postmeta
WHERE meta_key NOT LIKE '\_%'
ORDER BY meta_key
LIMIT 30

The execution plan is very similar in both cases. Both of them are practically equivalent in terms of efficient.

It's possible that excluding using a 'not in range' logic be a bit quicker that excluding using a 'not like' logic. But it will only make difference if there is no meta_key entry in the form _X, where X is a character greater than z. For example, '_Á' or '_ß'. In these cases, the second query would run in the same performance or even a little faster then the first one. More over, the NOT BETWEEN performance depends on the used CHARSET and its results depend on the COLLATION. It's possible that for some cases the first query will be better and for others the second query will be better.

So, in this specific case, I prefer the simpler solution (comments 10/11). But I remember that both of them are correct and much more efficient than the original one.

Final note:

The first query could be rewritten also as:

SELECT DISTINCT meta_key FROM wp_postmeta
WHERE meta_key NOT BETWEEN '_' AND '_z'
    AND meta_key NOT LIKE '\_%'
ORDER BY meta_key
LIMIT 30

This moves the final exclude (after the sort) to before the sort. Technically, it will be a little better.

It's also worth to notice that using the next query is wrong (because the '_Á' and '_ß' bug and the problem of changing the COLATION). So, the NOT LIKE clause (after or before the sort) is still required.

SELECT DISTINCT meta_key FROM wp_postmeta
WHERE meta_key NOT BETWEEN '_' AND '_z'
ORDER BY meta_key
LIMIT 30

#14 @dbwpe
10 months ago

Heya folks,

I just wanted to chime in real quick to say that @pento's 24498.patch was a very significant improvement for one of our customers here with ~3.8 million rows in _postmeta (InnoDB).

The single affected query went from ~17 seconds to ~0.03 seconds.

Original query:

SELECT meta_key 
    FROM wp_postmeta 
    GROUP BY meta_key 
    HAVING meta_key NOT LIKE '\\_%' 
    ORDER BY meta_key 
    LIMIT 30;
30 rows in set (16.83 sec)

Initial improvement suggestion:

SELECT DISTINCT meta_key
    FROM wp_postmeta
    WHERE meta_key NOT LIKE '\_%'
    ORDER BY meta_key
    LIMIT 30;
30 rows in set (4.03 sec)

@pento version

SELECT DISTINCT meta_key FROM wp_postmeta
    WHERE meta_key NOT BETWEEN '_' AND '_z'
        AND meta_key NOT LIKE '\_%'
    ORDER BY meta_key
    LIMIT 30;
30 rows in set (0.03 sec)

It's worth noting that most of this site's _postmeta are from WooCommerce, which mostly begin with underscores, so excluding those more efficiently really has a big impact.

Last edited 10 months ago by dbwpe (previous) (diff)

#15 follow-up: @obenland
10 months ago

@pento, can you make a decision for 4.3 on this before beta4?

#16 follow-up: @chriscct7
10 months ago

@pento, the "range" one on your test, how does that perform when the meta starts with or as suggested in comment:13?

#17 in reply to: ↑ 16 @dd32
10 months ago

Replying to chriscct7:

@pento, the "range" one on your test, how does that perform when the meta starts with or as suggested in comment:13?

I feel like this is a edgecase that we don't necessarily need to optimize for heavily. The further regexp/having clauses would filter out those, and they're certainly going to be less common.

The other option is that it's written as meta_key NOT BETWEEN '_' AND '`' which should filter out all which start with _ I think - but i'm not sure what performance implications that would have (` is the next character up from _)

#18 in reply to: ↑ 15 @pento
10 months ago

Replying to obenland:

@pento, can you make a decision for 4.3 on this before beta4?

Given the wide variety of behaviour seen here for the different query options, which I couldn't reproduce in testing, we'll probably need to punt this one.

I'm currently busy with 4.2.3, so if someone feels like having a go at this, please do. Beta4 is due out on the 22nd - if we can have something solid by then, I'm happy to include it.

#19 @dd32
10 months ago

Given the wide variety of behaviour seen here for the different query options, which I couldn't reproduce in testing, we'll probably need to punt this one.

In your opinion @pento, do you expect either of the proposed queries here to be slower than what we currently have?
If it's faster in 80% of cases, we should get this in, and iterate/improve it in 4.4 (I'll happily do both of these)

#20 @pento
10 months ago

Either will be faster than the original - as @rarylson noted in comment:11, the aim is to exclude the full scan caused by the GROUP BY in the original query. The worst case is that the original query and both of the proposed queries are about the same speed when the percentage of meta_keys beginning with _ is very low.

#21 @chriscct7
10 months ago

I concur with @dd32. Let's get one of the two in and then we can improve upon it next release.

This ticket was mentioned in Slack in #core by obenland. View the logs.


9 months ago

#23 @dd32
9 months ago

One main thing to keep in mind while testing this, is that (although you would think it wouldn't make that much of a difference) latin1 vs utf8 vs utf8mb4 seems to make a significant different here in performance, however the original being slow stands out under them all.

24498.patch seems to be safe to me, so lets go for it for 4.3.

If this still has performance issues for larger sites, I'd suggest that it should perhaps be cached in a transient (and cleared when an non-underscored field is set). This might be worth looking at in 4.4.

The other option is that it's written as meta_key NOT BETWEEN '_' AND '`' which should filter out all which start with _ I think - but i'm not sure what performance implications that would have (` is the next character up from _)

Just to loop back to this, the performance isn't great, MySQL doens't optimize it the same :)

#24 @dd32
9 months ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 33390:

Switch to a more performant query in meta_form().

Props lumaraf, swissspidy, rarylson, pento
Fixes #24498

#25 @jeichorn
9 months ago

I have a site with 4.5 million wp_postmeta rows and this query is still slow (20 seconds).

It looks like the where clause is used but in my case that still means creating a 4.1 million row temp table because of the distinct. I can write a patch with a transient if that's the solution that will be accepted, but at least in my case the 20 second delay every time its empty for a feature that isn't useful isn't great (even if they used the meta_key drop down, showing 30 of the 2k keys isn't helpful).

Should I start a new ticket or reopen this one?

#26 @dd32
9 months ago

@jeichorn Lets start a new ticket :) "Cache the list of keys in meta_form()" or something.

Curious though, if this takes 20s, how long did the previous one take? Unfortunately there isn't really much we can do about the temporary table here I don't think.

#27 @chriscct7
9 months ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

There's a couple other low hanging optimizations it looks like we can do here.

First, I think we can get rid of the orderby, since we already run the result through natcasesort.

Also, we can limit the meta it searches through by limiting to the post_id in the where clause.

Attaching something I just wrote on a phone (so if you want to test it make sure to sanity check it for autocorrect gremlins before trying to apply it)

@chriscct7
9 months ago

if a post id is passed in not the object, try for the optimization as well

#28 @chriscct7
9 months ago

  • Resolution set to fixed
  • Status changed from reopened to closed

Ah whoops, I'm thinking of the wrong form here. Both optimizations are invalid.
Stealing from DD32's feedback:
Limiting to $post’s post_id makes the function fast, but also useless, it’s designed for adding other meta to the current post that you’ve used elsewhere.. doing this just means that it’ll only show what you can already see. Ordering is done so that it gets the first 30(by default) results, ie. A, B, C, D instead of A, Z, G, R

#29 follow-up: @dd32
9 months ago

After chatting with @chriscct7 however, there is another angle which we could try here.

We could perform a query like this:

SELECT DISTINCT meta_key
FROM wp_postmeta pm
    LEFT JOIN wp_posts p ON pm.post_id = p.id
WHERE p.post_type IN ( 'post', 'page', 'current-post-type-of-the-post')
   AND meta_key NOT BETWEEN '_' AND '_z'
HAVING meta_key NOT LIKE '\_%'
ORDER BY meta_key DESC

I've not benchmarked it yet, but the idea is that plugins which add lots of meta_keys probably do it on "hidden" data post_types rather than the post_type that a user is actually editing. Since the user hasn't consciously added the meta keys to those posts, they shouldn't be shown here anyway.

If someone has a LOT of posts however (or only posts), this could potentially be slower under the conditions.

If anyone has a site that fits either of those extremes (thousands of meta_keys on non-post post_types - a large WooCommerce site for example I believe) or a site with many thousands of posts and can test that query out.. it'd be appreciated :)

Last edited 9 months ago by dd32 (previous) (diff)

#30 in reply to: ↑ 29 @jeichorn
9 months ago

Replying to dd32:

After chatting with @chriscct7 however, there is another angle which we could try here.

We could perform a query like this:

SELECT DISTINCT meta_key
FROM wp_postmeta pm
    LEFT JOIN wp_posts p ON pm.post_id = p.id
WHERE p.post_type IN ( 'post', 'page', 'current-post-type-of-the-post')
   AND meta_key NOT BETWEEN '_' AND '_z'
HAVING meta_key NOT LIKE '\_%'
ORDER BY meta_key DESC

I've not benchmarked it yet, but the idea is that plugins which add lots of meta_keys probably do it on "hidden" data post_types rather than the post_type that a user is actually editing. Since the user hasn't consciously added the meta keys to those posts, they shouldn't be shown here anyway.

If someone has a LOT of posts however (or only posts), this could potentially be slower under the conditions.

If anyone has a site that fits either of those extremes (thousands of meta_keys on non-post post_types - a large WooCommerce site for example I believe) or a site with many thousands of posts and can test that query out.. it'd be appreciated :)

That last query fixes the problem until you try to add a post of big type (This site has a location database with a couple 100k posts). The problem seems pretty intractable once you do any operation that requires a temp table with a couple 100k rows performance collapses. This effect seems to be made worse by utf8mb4.

At the end of the day this is pretty frustrating. Fighting a performance problems for a feature that spends most of its time hidden from view. Maybe just a add a hook for manually setting the meta_key list, and not running the query in that case would be a good solution. Its still a landmine waiting to blow up people's sites, but at least people can fix it.

#31 @chriscct7
9 months ago

#18979 was marked as a duplicate.

#32 @chriscct7
9 months ago

#18979 which was closed as duplicate, has it's own patch on the idea of post type limitation

This ticket was mentioned in Slack in #core by antpb. View the logs.


9 months ago

#34 @kitchin
9 months ago

Follow-up bug: #33296
As another follow-up, I re-opened #18979.

#35 follow-up: @tollmanz
8 months ago

I ran into this problem recently. When testing an update of a WP site with ~4 million post meta rows, I began seeing issues with this query. Before I knew this ticket existed, I worked on a reduced test case to illustrate the problem. Fortunately, I think it'll be really helpful for testing new queries as we move along.

I created a script and Travis CI integration (https://github.com/tollmanz/utf8mb4-query-time) that generates test data and a number of scenarios to test these queries. The Travis CI integration does the following:

  • Creates a SQL file with 1 million insert statements to populate a wp_postmeta table
  • Creates a 4.1.x style wp_postmeta table without the utf8mb4 character set/collation and populates it with the test data
    • Runs the 4.1.x style query
    • Runs the 4.3.x style query
  • Creates a 4.3.x style wp_postmeta table with the utf8mb4 character set/collation and populates it with the test data
    • Runs the 4.1.x style query
    • Runs the 4.3.x style query
  • Creates a 4.3.x style wp_postmeta table with the utf8mb4 character set/collation; however, switches the meta_key column back to utf8. This helps isolate the character set as the concern (and additionally allows me to test a temporary workaround that I might use). The test data is populated into this table.
    • Runs the 4.1.x style query
    • Runs the 4.3.x style query

To be clear, the "4.1.x style query" is:

SELECT meta_key
FROM wp_postmeta
GROUP BY meta_key
HAVING meta_key NOT LIKE '\_%'
ORDER BY meta_key
LIMIT 30;

And the "4.3.x style query" is:

SELECT DISTINCT meta_key
FROM wp_postmeta
WHERE meta_key NOT BETWEEN '_' AND '_z'
HAVING meta_key NOT LIKE '\_%'
ORDER BY meta_key
LIMIT 30;

If you view the Travis build, you'll be able to see query results, timing information, and EXPLAIN information.

The TL;DR results of a recent run (https://travis-ci.org/tollmanz/utf8mb4-query-time/builds/80031807) were:

            | utf8        | utf8mb4      | utf8 meta_key |
            |-------------|--------------|---------------|
4.1.x query | 0.00036725s | 21.15712650s | 0.00038975s   |
4.3.x query | 0.00051800s | 21.58118725s | 0.00043800s   |

If anyone would like me to add more test queries or change things up here, I'm more than happy to do so. This is really easy now that it's scripted. Additionally, you could just fork it, make changes, and PR, which will kick off a run.

Is there any reason that the meta_key column has to be utf8mb4? Would it make sense to consider reverting it to its previous state? That feels kinda dirty to me, but it is at least worth consideration.

Note...I tried to run @dd32's query in these tests, but it's not set up to have a wp_posts table yet, so I'd need to do some more work on that. I did, however, try the query against the DB that started me looking at this and the query was taking ~15s. The post's table is in the neighborhood of 500k rows.

h/t to @jorbin for working through parts of this with me.

#36 in reply to: ↑ 35 @jorbin
8 months ago

We should likely spin this off into it's own ticket. That said, I added an additional test case based off tollmanz's github and travis setup (Awesome work right there. Thank You) which sets the innodb_large_prefix option to on. This increases the limit beyond 767 byte limit.


            | utf8        | utf8mb4      | utf8 meta_key | with option utf8 | with option mb4 |
            |-------------|--------------|---------------|------------------|-----------------|
4.1.x query | 0.00037275s | 39.45441000s | 0.00044975s   | 0.00030775s      | 17.67479500s    |
4.3.x query | 0.00047425s | 22.14528975s | 0.00051775s   | 0.00052575s      | 20.47027650s    |

The options doesn't reduce the query down to the pre utf8mb4 levels.

This ticket was mentioned in Slack in #core by jorbin. View the logs.


8 months ago

#38 follow-up: @pento
7 months ago

Hey @tollmanz, sorry about taking so long to get back to you.

Could you please open up a new ticket? Also, would it be possible for you to run these tests against MariaDB 10.0? I'm reaching out to some MariaDB contacts who've offered to help, but it'd be good to show them numbers from MariaDB.

#39 in reply to: ↑ 38 @vhauri
7 months ago

Replying to pento:
@pento we've also got a site that we can test on MariaDB if you'd like. Just let me know what you're looking for (same EXPLAINs as @tollmanz ran?). Thanks!

Hey @tollmanz, sorry about taking so long to get back to you.

Could you please open up a new ticket? Also, would it be possible for you to run these tests against MariaDB 10.0? I'm reaching out to some MariaDB contacts who've offered to help, but it'd be good to show them numbers from MariaDB.

#40 @tollmanz
7 months ago

@pento I tried MariaDB, but got the same results: https://travis-ci.org/tollmanz/utf8mb4-query-time/builds/82649692. @jorbin opened a new ticket here: https://core.trac.wordpress.org/ticket/33885. Shall we discuss there?

#41 @tollmanz
7 months ago

I think I figured out the issue, and posted in the new ticket: https://core.trac.wordpress.org/ticket/33885#comment:2.

Note: See TracTickets for help on using tickets.