WordPress.org

Make WordPress Core

Opened 3 years ago

Last modified 7 months ago

#38285 new defect (bug)

Object lookups by ID (or primary key) are not managed properly, lots of memory leaks

Reported by: wonderboymusic Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: Cache API Keywords: has-patch needs-refresh
Focuses: Cc:
PR Number:

Description

tl;dr this affects every object that uses the WP_Post::get_instance() paradigm of "managing instances" (spoiler: they are not)

WP_Post and friends are supposed to add more sanity to caching and sanitization. They are not models, they are read-only data objects that strongly-type database rows. As such, it should be easy to maintain only one reference to any given row = one object per row, regardless of method of query, you should always get the exact same object. This tends to work if you pass around objects, but not when passing around IDs.

Passing around IDs is a common use-case:

$ids = get_favorite_posts();

foreach ( $ids as $id ) {
    $post = get_post( $id );

    // wreak havoc
}

In a theme:

get_the_title( get_the_ID() );

In Doctrine (or Hibernate, or any other ORM), each type is an Entity that is managed by an Entity Manager. There is never more than one instance of a single item. In WordPress, we get get_post() and a built-in non-persistent Array Cache. We also get WP_Post::get_instance( $id ), which makes us believe that WP_Post is managing instances. It is not. The Array Cache is maintaining all of these references.

A few problems:

  • calls to ::get_instance() will always touch the cache
  • The Array Cache clones objects before storing them, and clones objects before returning them
  • The Array Cache is storing stdClass instances, not WP_Post instance, so every item out of the cache also has to become a brand new instance of WP_Post.

Storing the items as WP_Post objects would seemingly remove the need to return new instances, but they would be new instances nonetheless, since the cache clones every object before saving / returning.

What needs to be done:

  • Object rows are stored strongly-typed
  • Cache does not clone objects

How to test that this is happening (using WP-CLI):

<?php
class Prof {
	public function objects() {
		$store = new \SplObjectStorage();

		$mem1 = memory_get_usage();

		$p = get_post( 27 );
		foreach ( range( 1, 20 ) as $i ) {
			$p = get_post( $p );
			if ( ! $store->contains( $p ) ) {
				$store->attach( $p );
			}
		}

		$mem2 = memory_get_usage();

		\WP_CLI::line( round( ( $mem2 - $mem1 ) / 1024, 2 ) );
		\WP_CLI::line( $store->count() );
	}

	public function ids() {
		$store = new \SplObjectStorage();

		$mem1 = memory_get_usage();

		foreach ( range( 1, 20 ) as $i ) {
			$p = get_post( 27 );
			if ( ! $store->contains( $p ) ) {
				$store->attach( $p );
			}
		}

		$mem2 = memory_get_usage();

		\WP_CLI::line( round( ( $mem2 - $mem1 ) / 1024, 2 ) );
		\WP_CLI::line( $store->count() );
	}
}

if ( class_exists( '\WP_CLI' ) ) {
	\WP_CLI::add_command( 'prof', Prof::class );
}

Both should print out 1. Without the patch, ids will print out 20 + a bunch of leaked memory.

Patch for WP_Post + WP_Object_Cache attached

Attachments (1)

38285.diff (1.3 KB) - added by wonderboymusic 3 years ago.

Download all attachments as: .zip

Change History (4)

@wonderboymusic
3 years ago

#1 @dd32
3 years ago

For what it's worth, the patch as-is is going to run into issues. Storing typed classes in the object cache has historically lead to hard-to-find bugs, and while it at first appears to work, it causes huge problems when you add a persistent cache (such as memcache) and then either migrate PHP (and don't flush the cache) or even worse when you change the class definition (and don't flush the cache) and the unserialized object is a hybrid of those two definitions (when it was stored, and when it was retrieved)

I can't recall what the need for clone is, I suspect that's mostly for non-typed objects caching one thing, and then modifying it after.

A hybrid approach might be to allow typed objects which specify a serializer/deserializer to be stored, only clone stdClass's, or something else entirely.

To me, it feels like this might be better off handled higher than the cache, WP_Post having a static reference to all WP_Post's it's created. Unfortunately I think that would mean memory usage would be doubled as there's no way from PHP-land to know that there's only one reference to that memory location left.

The other alternative is waiting for PHP Future.Release to add a Comparable interface or something to allow comparing typed objects properly.

However, there's other cases where you'd legitimately want two different objects of the same post, for example:

$post = get_post( 1 );
modify_post( 1 );
$modified_post = get_post( 1 );
do_action( 'changed_something', $modified_post, $post );

assertTrue( $post !== $modified_post );

#2 @jipmoors
3 years ago

Keep in mind that it will also likely run into the situation where an object is fetched from cache before the class has been loaded, resulting in a __PHP_Incomplete_Class rendering the object useless. Especially when there is no general autoloader to load it just in time.

#3 @desrosj
7 months ago

  • Keywords has-patch needs-refresh added

This needs a refresh to adjust @dd32's concerns.

Note: See TracTickets for help on using tickets.