WordPress.org

Make WordPress Core

Opened 4 years ago

Last modified 11 months ago

#34848 assigned enhancement

Add support for updating post meta in bulk

Reported by: patrickgarman Owned by: chriscct7
Milestone: Future Release Priority: normal
Severity: normal Version:
Component: Options, Meta APIs Keywords: has-patch has-unit-tests needs-testing close
Focuses: performance Cc:
PR Number:

Description

A limiting factor in performance of inserting posts into the database is postmeta. I've been running a patch locally which adds functions for adding postmeta in one function call instead of calling add_post_meta multiple times over and over.

add_post_meta creates a single SQL insert query, when adding 20 post metas that is 20 SQL inserts all run separately. This can be greatly improved by combining these into a single SQL insert.

There is a problem where updating meta in bulk would likely be impossible or very painful. At the very least I have not been able to find a way to do this. Deleting I haven't developed a function but I imagine it would be fairly easy.

/**
 * Add metadatas to a post.
 *
 * @since x.x.x
 *
 * @param int    $post_id    Post ID.
 * @param string $meta_data  Metadata as an key/value pair array
 *
 * @return bool  Was the data inserted
 */
function add_post_metas( $post_id, $meta_data ) {
        // Make sure meta is added to the post, not a revision.
        if ( $the_post = wp_is_post_revision($post_id) )
                $post_id = $the_post;

        return add_metadatas('post', $post_id, $meta_data);
}

/**
 * Add multiple metadatas for the specified object. Similar to calling add_metadata for each metadata individually,
 * and is only applicable for unique meta data. If a meta key already exists for an object it will not be stored.
 *
 * @since x.x.x
 *
 * @global wpdb $wpdb WordPress database abstraction object.
 *
 * @param string $meta_type  Type of object metadata is for (e.g., comment, post, or user)
 * @param int    $object_id  ID of the object metadata is for
 * @param array  $meta_data  Metadata as an key/value pair array
 *
 * @return bool  If the metadata was stored successfully.
 */
function add_metadatas($meta_type, $object_id, $meta_data) {
        global $wpdb;

        if ( ! $meta_type || ! is_array( $meta_data ) || ! is_numeric( $object_id ) ) {
                return false;
        }

        $object_id = absint( $object_id );
        if ( ! $object_id ) {
                return false;
        }

        $table = _get_meta_table( $meta_type );
        if ( ! $table ) {
                return false;
        }

        $column = sanitize_key($meta_type . '_id');

        /**
         * Filter whether to add metadatas of a specific type.
         *
         * The dynamic portion of the hook, `$meta_type`, refers to the meta
         * object type (comment, post, or user). Returning a non-null value
         * will effectively short-circuit the function.
         *
         * @since x.x.x
         *
         * @param null|bool $check      Whether to allow adding metadata for the given type.
         * @param int       $object_id  Object ID.
         * @param string    $meta_key   Meta key.
         * @param mixed     $meta_value Meta value. Must be serializable if non-scalar.
         * @param bool      $unique     Whether the specified meta key should be unique
         *                              for the object. Optional. Default false.
         */
        $check = apply_filters( "add_{$meta_type}_metadatas", null, $object_id, $meta_data );
        if ( null !== $check )
                return $check;

        $_meta_data = array();
        foreach( $meta_data as $key => $value ) {
                if ( 0 == absint( $wpdb->get_var(
                                $wpdb->prepare( "SELECT COUNT(*) FROM $table WHERE meta_key = %s AND $column = %d", $key, $object_id )
                ) ) ) {
                        $key = wp_unslash( $key );
                        $value = wp_unslash( sanitize_meta( $key, $value, $meta_type ) );

                        $_meta_data[ $key ] = maybe_serialize( $value );

                        /**
                         * Fires immediately before meta of a specific type is added.
                         *
                         * The dynamic portion of the hook, `$meta_type`, refers to the meta
                         * object type (comment, post, or user).
                         *
                         * @since 3.1.0
                         *
                         * @param int    $object_id  Object ID.
                         * @param string $meta_key   Meta key.
                         * @param mixed  $meta_value Meta value.
                         */
                        do_action( "add_{$meta_type}_meta", $object_id, $key, $value );

                }
        }

        $rows = array();
        if( ! empty( $_meta_data ) ) {
                $sql = "INSERT INTO {$table} ({$column}, meta_key, meta_value) VALUES ";

                $comma = false;
                foreach( $_meta_data as $key => $value ) {
                        if( true == $comma ) {
                                $sql .= ',';
                        }

                        $sql .= "({$object_id}, '{$key}', '{$value}')";
                        $comma = true;
                }
        }

        $result = $wpdb->query( $sql );

        if ( ! $result )
                return false;

        wp_cache_delete($object_id, $meta_type . '_meta');

        return true;
}

Attachments (8)

meta-bulk.php (5.8 KB) - added by boonebgorges 4 years ago.
34848.diff (11.1 KB) - added by patrickgarman 2 years ago.
34848.2.diff (9.8 KB) - added by patrickgarman 2 years ago.
Updated some missing docblocks
34848.3.diff (9.8 KB) - added by patrickgarman 2 years ago.
More docblock updates, string !== array
34848.4.diff (12.4 KB) - added by patrickgarman 2 years ago.
Update SQL Queries && Add Unit Tests
34848.5.diff (12.4 KB) - added by bradparbs 2 years ago.
Minor code standards fixes
34848.6.diff (12.6 KB) - added by bradparbs 2 years ago.
Start of adding native metadata filters + further CS cleanup
34848.7.diff (12.6 KB) - added by bradparbs 2 years ago.
Fixed indentation

Download all attachments as: .zip

Change History (43)

#1 @chriscct7
4 years ago

  • Focuses performance added

This would be really useful for core in the meta endpoints of REST.

#2 @swissspidy
4 years ago

  • Keywords needs-patch added

#3 @sc0ttkclark
4 years ago

I like this idea too, but definitely think it needs different naming from metadatas / metas.

#4 @chriscct7
4 years ago

  • Keywords needs-unit-tests added
  • Milestone changed from Awaiting Review to Future Release
  • Owner set to chriscct7
  • Status changed from new to assigned

#5 @patrickgarman
4 years ago

I locally I've created a possible way to handle updating post meta in bulk. Simply deleting the post meta and then re-inserting it.

https://gist.github.com/pmgarman/0cdaa428366db4455ced#file-wp-meta-patch-php

This ticket was mentioned in Slack in #core-restapi by sc0ttkclark. View the logs.


4 years ago

#7 @chriscct7
4 years ago

  • Milestone changed from Future Release to 4.5

#8 @chriscct7
4 years ago

Going to work on finishing up 2 different versions of the patch at WordCamp Miami this weekend.

#9 @sc0ttkclark
4 years ago

I vote for function names like add_bulk_post_meta() and add_bulk_metadata()

Instead of metas / metadatas usage, is that not bugging anyone else? :)

This ticket was mentioned in Slack in #core by jorbin. View the logs.


4 years ago

#11 @jorbin
4 years ago

  • Milestone changed from 4.5 to Future Release

Lack of traction, so this is going to miss 4.5

This ticket was mentioned in Slack in #core by pmgarman. View the logs.


4 years ago

#13 @patrickgarman
4 years ago

Anyone currently have any form of a patch they have started for this? If not at the start of next week I'll have some time I'm going to put towards trying to push this along starting with writing a patch and anything else I need to do to help get this ready for merging in.

#14 @boonebgorges
4 years ago

A thought: What if add_metadata() and update_metadata() handled this natively? We could avoid a lot of duplicated code (and bad function names) by using the functions we already have. Syntax:

function add_metadata( $meta_type, $object_id, $meta_key, $meta_value, $unique = false ) {
    if ( is_array( $meta_key ) ) {
        $metadata = $meta_key;
    } else {
        $metadata = array( $meta_key => $meta_value );
    }

    // etc
}

Obviously this is way oversimplified - I don't love having an ambiguous function signature - but I think the developer experience would be much better.

#15 @patrickgarman
4 years ago

My original idea of how to implement it was down that route... my concern was beyond feeling a bit weird, if someone passes an array into the functions as they exist now for $meta_key, how badly does it break things? Inevitably that would end up happening, and I'm not as familiar with past changes like this (if there have been any) how those were handled and how the possibly breakage was handled. Considering WP is supposed to be super backwards compatible would this go against that?

#16 @boonebgorges
4 years ago

Currently, if you pass an array to $meta_key, lots of stuff will break. For example, $wpdb->prepare( "SELECT COUNT(*) FROM $table WHERE meta_key = %s AND $column = %d", $meta_key, $object_id ). So I don't think there are many backward compatibility concerns, since that syntax currently will not work in any meaningful way.

A couple other thoughts about the approach, since I'm back commenting on this ticket :)

  • Can we use the $wpdb insert/update/delete handlers instead of building SQL?
  • Why does update not get separate treatment? The add vs update behavior is pretty established for single metadata, so it probably makes sense to reproduce it here. This includes things like action/filter names and add fallback for non-existing meta in update.
  • Return values are tricky here. add_metadata() returns the row ID on success, while update returns a boolean. Is it worth trying to reproduce this? What would this look like if add fell back on update for some, but not all, of the keys passed to the function?

#17 @patrickgarman
4 years ago

It's been a while since I wrote it, but I'll try to answer your questions from memory.

  • I don't think WPDB handled the building of the SQL for all the queries correctly, but I'm not 100% sure without going back and testing which I can't do until next week.
  • Do you know how to update many metas via a single SQL statement without deleting and re-adding the metadata? I'm no DBA, but I couldn't find anything that would have worked the same way a single update would.
  • Going in bulk I think complicates things, if you insert 10 rows of data in a single SQL statement you get a single yes/no that it worked via straight SQL. So we could do a select to then pull all the IDs, not TOO innefficient. Normal inserts/updates just uses WPDB to pull the inserted/updated ID which I'm not sure how well that works if in some way all the queries could be moved to work with WPDB.

This ticket was mentioned in Slack in #core by mindsize. View the logs.


4 years ago

#19 @boonebgorges
4 years ago

Thanks for the additional thoughts, @patrickgarman.

I've spent some time thinking more about this. My first thought is that it probably does make more sense to go with different function names, rather than trying to wedge this functionality into the existing functions. The function signatures and return values would be too different to make any sense.

That being said, I've run a bunch of tests, and have found mixed performance results. In many cases, looping over add_metadata() is actually much *faster* than running a single INSERT INTO with lots of values. meta-bulk.php shows what I've been testing - meant to be run $ wp eval-file /path/to/meta-bulk.php. Comment stuff out to test add_metadatas() vs an add_metadata() loop. Obviously, these tests are not scientific, but a bit of testing shows that bulk INSERT statements are on the magnitude of 2x *slower*, though depending on the size of the meta_value and meta_key strings the memory footprint can be somewhat less.

I also found that behavior of add_metadatas() tended to get pretty weird at very large sizes. PHP had a hard time dealing with string concatenation after a certain length.

Am I doing something wrong? Maybe there's a kind of metadata payload that I haven't thought of, where a bulk INSERT performs much better.

You'll also see that I experimented with wrapping the add_metadata() loop in a transaction. This *did* have a noticeable performance benefit - depending on the nature of the metadata, anywhere from a 20-40% decrease in time elapsed. I would be somewhat wary of introducing a function for this into core, though - transactions are vulnerable to data loss if they're interrupted, and for certain kinds of payload, there can be performance problems during the commit. There might be benefits to using transactions like this in some places (I'm thinking of the importer - @rmccue have you thought about this kind of thing?), but it would have to be in situations where the routine is idempotent or has other failsafes built in. There might also be considerations related to database replication and the like that make this improvement the kind of thing that ought to be handled on an implementation-specific basis.

@pento Do you have any wisdom to impart about the potential benefits of (a) large INSERT INTO ... VALUES queries, and/or (b) running large numbers of small INSERT or UPDATE queries inside of a transaction for the performance benefit?

#20 @patrickgarman
4 years ago

I think I've mentioned before, and I'll preface again with "I'm not a DBA, this isn't my expertise, but based on the work I've been doing this is the experience I've had" line like usual.

I've included some basic info around the experience I've had with reducing raw volume of inserts by combining them in point 3 on my comment here: https://github.com/woothemes/woocommerce/issues/9735#issuecomment-168032658

tl;dr - went from 600 inserts per minute down to 200 inserts per minute (same volume of data, just combining inserts) and the query time itself went up on average of ~.1ms

This data is from a high volume WooCommerce site, and New Relic is the tool used to pull this specific data. I'm assuming the tool here is accurate, I've not found or heard anything to believe it isn't accurate. To further expand on this... The reason I started at that time focusing on reducing the query volume is the volume of orders being inserted into the database (this was cyber monday) was causing the postmeta table to lock up and would bring the site to a halt. Database server loads were over 120 (32 cpu, 120GB ram). Since then and after I implemented these changes (and nothing else specifically related to reducing postmeta related things) we've experienced even higher loads where we no longer were bottlenecked by the postmeta table in this way.

I'll add here about transactions (again not being an expert here either) that WooCommerce uses transactions already. When an order is being created (see links below), this has caused issues at times under load. When the database server locks up like it has on occasion due to volume what I'm finding is the postmeta of an order gets inserted yet the record in the posts table will not exist. Now, in theory a postmeta will NEVER get inserted if the post itself doesn't get inserted, there isn't a post_id to be had in this case. Yet something is happening in that transaction to not insert the posts record yet does complete the postmeta records.

https://github.com/woothemes/woocommerce/blob/master/includes/class-wc-checkout.php#L181
https://github.com/woothemes/woocommerce/blob/master/includes/wc-core-functions.php#L1097

I look forward to hearing more about the stats and maybe more about what someone who knows what they doing here thinks of the data, experiences, and issues I've run into.

This ticket was mentioned in Slack in #core by swissspidy. View the logs.


3 years ago

#22 @patrickgarman
2 years ago

Going to resurrect this and attempt to see if we can get this pushed through. (at WCUS Contributor Day right now)

@boonebgorges I have some data regarding performance, real world data at scale more than tests. Yes, the individual query is slightly slower, however, the total query time overall is actually less. As described in my 19 month old comment as well.

Do we need more data around performance and more testing there?

@patrickgarman
2 years ago

@patrickgarman
2 years ago

Updated some missing docblocks

@patrickgarman
2 years ago

More docblock updates, string !== array

@patrickgarman
2 years ago

Update SQL Queries && Add Unit Tests

#23 @patrickgarman
2 years ago

I have added a patch which includes passing unit tests.

This ticket was mentioned in Slack in #core by pmgarman. View the logs.


2 years ago

#25 @patrickgarman
2 years ago

  • Keywords has-patch has-unit-tests needs-testing added; needs-patch needs-unit-tests removed

@bradparbs
2 years ago

Minor code standards fixes

#26 @bradparbs
2 years ago

Echoing some of the earlier conversations, the thing throwing me off here is the naming of the functions. While metadatas makes sense, but I'm wondering if there is a better choice we could use that is more clear.

#27 @bradparbs
2 years ago

Discussing this with @peterwilsoncc right now. Right now, there's a few missing filters that exist in the native add_meta_data-related functions. add_{$meta_type}_metadata is a good example, we need to support being able to not upsert specific values in the bulk array.

Peter also made a good point re: naming.metadatas is a really easy typo off metadata that might cause some confusion to a user.

@bradparbs
2 years ago

Start of adding native metadata filters + further CS cleanup

@bradparbs
2 years ago

Fixed indentation

#28 @jorbin
2 years ago

Peter also made a good point re: naming.metadatas is a really easy typo off metadata that might cause some confusion to a user.

Completely agree with this. Thinking bulk_* would be a better convention.

#29 @patrickgarman
2 years ago

So @bradparbs did you add the missing filters or are some still not there just yet. Haven’t looked at your new diffs.

I think some form of “bulk” in the name can work and is still clear, without causing confusion or allowing easy typos from code completion/etc. I’ll check it out tonight.

#30 @bradparbs
2 years ago

  • Keywords close added

After a lot of discussion and brainstorming with @peterwilsoncc, I'm really thinking this might need a a bit of a different approach. Right now, the update doesn't support a lot of native utilities, such as passing an old value to update across, etc.

I'm also thinking the delete and then insert pattern for updating isn't the safest thing. I'd hate to have a race condition or DB failure that causes data loss to a user. Potentially we could check what keys exist, run a normal update_post_meta on those, and then a bulk insert on the the rest.

Overall, I think bulk update should be the focus, because that's most likely going to be the most used function of all of these, just like the usages of update_post_meta vs add_post_meta in plugins/projects.

Updates are really a can of worms with a lot of edge cases. I think we need a cleaner solution to them before this is ready.

#31 @patrickgarman
2 years ago

That’s the kind of feedback and discussion I think this is needing and to get this further along, more eyeballs. Would not considering updates in bulk at all be a better approach? Inserting and deleting are where it has provided the most value in work in the past for us.

#32 @bradparbs
2 years ago

I think having bulk update/add/delete is a useful utility function to have. Might make sense to transition this to a feature plugin and figure out a robust way to solve some of the issues.

I don't think we want to just drop bulk update support, I think it's just (if not more) useful than add/delete.

#33 @peterwilsoncc
2 years ago

I really like the idea but it's update_? that I have the biggest concerns about:

  • there is no support for updating based on the existing meta value, this is required for the bypass filter to work and for feature parity
  • delete and add in the current form will delete duplicates
  • updates with the existing value specified will require one of two approaches:
    • multiple database calls to so you can use different WHERE clauses
    • getting all of an objects meta data and filtering the array
  • new meta data (ie, a new meta_key) will need to call the add function, remember to consider the existing value
  • updates will need to use sql UPDATEs, meta_id and order in which the data is inserted will need to stay the same. These will almost certainly need to be unique DB calls to allow for unique WHERE clauses

I like @boonebgorges's suggestion above of making the existing functions support an array. The existing signature can be supported via a self-invoking function call.

The way ahead:

I would like to see a plugin that with a proof of concept. The meta_key attribute can be an array & get to the bypass filter.

On the bypass filter, the plugin intercepts the code and runs the bulk update. A meta key, object ID and object type is enough to get through the validation.

Rubber ducking this with @bradparbs yesterday, there were a lot of edge cases that need to be considered.

This ticket was mentioned in Slack in #core by boone. View the logs.


11 months ago

#35 @eddr
11 months ago

Hi! new here

I'd to make it done
Was thinking about the SQL part - perhaps allowing multiple key->value updates in wpdb->update be useful, instead of the simple but custom SQL?

Note: See TracTickets for help on using tickets.