Make WordPress Core

Opened 5 weeks ago

Last modified 4 days ago

#64080 assigned defect (bug)

WP_Upgrader doesn't release lock on create_lock

Reported by: skithund's profile skithund Owned by: skithund's profile skithund
Milestone: 7.0 Priority: normal
Severity: normal Version: 4.5
Component: Upgrade/Install Keywords: reporter-feedback needs-patch dev-feedback
Focuses: Cc:

Description

By default WP_Upgrader::create_lock( 'foo' ); should release existing lock after one hour, if the lock is still active.

This doesn't happen at least on current release of MariaDB and PHP 8.4.

wpdb documentation says that query method "returns an integer value indicating the number of rows affected/selected", so WP_Upgrader should check for 0 return value, which MariaDB is returning if no rows are affected, since the lock exists. And possibly check false for backwards compatibility/database error?

Attachments (1)

class-wp-upgrader.php.diff (602 bytes) - added by skithund 5 weeks ago.

Download all attachments as: .zip

Change History (21)

This ticket was mentioned in Slack in #core by skithund. View the logs.


3 weeks ago

#2 @wildworks
3 weeks ago

  • Keywords has-patch added

This ticket was mentioned in PR #10356 on WordPress/wordpress-develop by @skithund.


3 weeks ago
#3

  • Keywords has-unit-tests added

Check for correct return value in create_lock() in lock creating query.
wpdb::query() returns 0 when there are no rows affected, eg. the lock already exists and no INSERT was performed.

Add two tests for WP_upgrader::create_lock()

Trac ticket: https://core.trac.wordpress.org/ticket/64080

#4 @skithund
3 weeks ago

  • Keywords has-unit-tests removed

GitHub PR adds two tests for create_lock(), for the actual lock creation and another test for lock re-creation/expiration.

Running the following would fail on current 6.8.3 WordPress install, since create_lock() wouldn't re-recreate the lock, which should've been "expired".

After the patch, the lock would be recreated.

<?php

WP_Upgrader::create_lock( 'test' );
sleep( 2 );
$status = WP_Upgrader::create_lock( 'test', 1 );
if ( false === $status ) {
  wp_die( 'Lock (re)creation failed' );
}

We stumbled upon this bug due to erroneous code on our side, which fatal errored after creating the lock, but before running WP_Upgrader::release_lock( 'test' );, creating an infinite lock, even though the default lock "expiration" should be one hour.

#5 @skithund
3 weeks ago

  • Keywords has-unit-tests added

#6 @SergeyBiryukov
3 weeks ago

  • Milestone changed from Awaiting Review to 6.9

@skithund commented on PR #10356:


2 weeks ago
#7

@westonruter Thanks for review/suggestions! Changed as suggested

@westonruter commented on PR #10356:


2 weeks ago
#8

Oh, but:

This doesn't happen at least on current release of MariaDB and PHP 8.4.

And I'm running MySQL.

#9 @westonruter
2 weeks ago

  • Keywords reporter-feedback added

#10 @afragen
2 weeks ago

The actual @return of WPDB::query() is

	 * @return int|bool Boolean true for CREATE, ALTER, TRUNCATE and DROP queries. Number of rows
	 *                  affected/selected for all other queries. Boolean false on error.

What is the actual return value from the query that generated your error?

@afragen commented on PR #10356:


2 weeks ago
#11

I asked @costdev to take a look and here are his thoughts.

1. The code performs the same whether `wpdb::query()` returns `0` or `false`.
2. The PR doesn't solve a problem.
3. However, the PR introduces a BC break: When `wpdb::query()` returns `false`, Core will go through checks and possibly release the lock. The PR guarantees that `false` will never cause a released lock.
4. The first test's docblock has no `@ticket`.
5. The second test's code doesn't make clear what it's actually doing, and there are 3 `assertTrue()` calls but only one has a third `$message` argument to explain what went wrong if the assertion fails.
6. There's no bug reproduction on the ticket to prove the bug is one in Core, nor to narrow it down to a cause.

IMO: The bug should actually be reproduced and confirmed to be a Core bug, and the cause is narrowed down. I appreciate that Toni did work on this, including adding tests, but it's too early as demonstrated by the PRs ineffectiveness and introduction of a BC break in Core

#12 @skithund
2 weeks ago

And I'm having really hard time wrapping my head around the original problem and reproduction due to working on thousand and one things after this. I'm 99% sure that the PHPUnit tests I wrote were failing too, but yes, that's impossible 🫠

The original problem is/was that we use WP_Upgrader::create_lock('our_sync'); to create a lock for an hourly scheduled event, where we sync data from an ERP to WordPress. Due to not accounting for connection errors, when connecting to ERP, our code exited, not going through the full sync process and running WP_Upgrader::release_lock('our_sync'); at the end - leaving the lock in the database.

The scheduled event then ran every hour, but failed to create lock, even though it should "release" itself after the default one hour timeout in WP_Upgrader::create_lock();

I'll try to look into this again possibly tomorrow.

#13 @westonruter
2 weeks ago

  • Owner set to skithund
  • Status changed from new to assigned

#14 @skithund
2 weeks ago

I'm unable to leave this alone and I'm getting somewhere. This seems to be caching issue/regression(?) and I've been barking at the wrong tree.

WP_Upgrader::create_lock() creates a lock into the database with an INSERT query, bypassing what ever is used for handling/priming options cache. And since get_option( 'lock.lock' ); is returning false right after INSERT, it's triggering "If a lock couldn't be created, and there isn't a lock, bail." and not being able to re-create the lock.

This code presents the problem in caching

<?php
global $wpdb;

// Make sure lock doesn't exist in cache
delete_option( 'lock.lock' );

$wpdb->query(
        $wpdb->prepare(
                'INSERT IGNORE INTO %i (option_name, option_value, autoload) VALUES (%s, %s, %s)',
                $wpdb->options,
                'lock.lock',
                time(),
                'off'
        )
);
var_dump( get_option( 'lock.lock' ) );
wp_cache_flush();
var_dump( get_option( 'lock.lock' ) );

// Result:
// bool(false)
// string(10) "1761580496"
Last edited 2 weeks ago by skithund (previous) (diff)

@skithund commented on PR #10356:


2 weeks ago
#15

Closing, since the actual bug is in caching and different approach is needed.

#16 @skithund
2 weeks ago

WordPress 6.6.4 works fine

$ wp core update --version=6.6.4 --force
$ wp eval-file test.php
string(10) "1761586413"
string(10) "1761586413"

WordPress 6.7.0 breaks

$ wp core update --version=6.7.0 --force
$ wp eval-file test.php
bool(false)
string(10) "1761586426"

#17 @westonruter
2 weeks ago

@skithund Are you going to git-bisect between 6.5 and 6.7 to find the specific commit that causes the issue?

#18 @skithund
2 weeks ago

  • Keywords has-patch has-unit-tests removed

My reproduction code in comment 14 is failing due to 6.7 introducing notoptions cache in [58782]

  • delete_option( 'lock.lock' ); sets lock.lock into notoptions cache in delete_option().
  • INSERT inserts the option back, skipping caches, making notoptions cache stale.
  • get_option( 'lock.lock' ); fails.

But I'm having really hard time to figure out why our actual production code is failing, since it's not doing delete_option( 'our_sync.lock' ); in any point. It has to be notoptions and persistent object caching, but how/why. My first hunch with 0 === wpdb::query() was a red herring.

Below is our production code stripped down with run_our_sync scheduled to be run hourly, using external cron and DISABLE_WP_CRON defined as true. We're running PHP 8.4, MariaDB 11.8 and Redis with redis-cache plugin.

The following wp_options table row stayed in until I released the lock manually after ~48 hours.
(1418, 'our_sync.lock', '1759614618', 'off')

Timeline was that lock was created at 00:50:18, HTTP failed at 00:50:23, leaving a lock behind and every hourly try after that failed to create the lock (01:50:46, 02:50:42, 03:50:40 etc.).

Should the component be changed to Cache API and version to 6.7?

<?php
add_action( 'run_our_sync', 'do_sync' );

function do_sync() {
  require_once ABSPATH . 'wp-admin/includes/class-wp-upgrader.php';

  error_log( 'Starting sync and creating lock' );

  if ( ! WP_Upgrader::create_lock( 'our_sync' ) ) {
    error_log( 'Failed to create lock' );

    return;
  }

  $response = wp_remote_get( 'https://example.org/data.json' );
  if ( is_wp_error( $response ) ) {
    error_log( 'HTTP request failed' );

    // Here we originally forgot to release lock, leaving a lock behind after a failed HTTP request.
    // WP_Upgrader::release_lock( 'our_sync' );

    return;
  }

  // Do stuff.

  WP_Upgrader::release_lock( 'our_sync' );
}
Last edited 2 weeks ago by skithund (previous) (diff)

#19 @skithund
2 weeks ago

And then there's the question, why is there a INSERT IGNORE used in WP_Upgrader::create_lock() and four other places duplicating identical code instead of using Options API to interact with options, since the lock is "just an option"?

To avoid unnecessary database queries or just old/duplicated code from way back when the Options API wasn't available?

#20 @wildworks
4 days ago

  • Keywords needs-patch dev-feedback added
  • Milestone changed from 6.9 to 7.0

Since the 6.9 RC1 release is coming soon, I will punt this ticket to 7.0.

Most likely, we will need technical feedback to move this ticket forward.

Note: See TracTickets for help on using tickets.