Make WordPress Core

Opened 3 years ago

Last modified 18 months ago

#53705 new defect (bug)

Plugin upgrade deletes files from other in-progress upgrades

Reported by: bpayton's profile bpayton Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version: 5.8
Component: Upgrade/Install Keywords: needs-testing-info
Focuses: Cc:

Description

If two plugin upgrades are requested at the same time, the later upgrade can interfere with the earlier upgrade by deleting its source files while the earlier upgrade is still in progress.

This can lead to either disappearing plugins or plugins with missing files.

This is because Plugin_Upgrader::upgrade() calls WP_Upgrader::run() which calls WP_Upgrader::unpack_package() here, and WP_Upgrader::unpack_package() unconditionally deletes all files from the wp-content/upgrade/ directory here.

If plugin upgrade A is in progress with new plugin files under wp-content/upgrade/, plugin upgrade B will delete them as part of unpacking its zip file.

Plugin upgrade conflicts can cause a plugin to disappear with the following sequence:

  1. Plugin upgrade A downloads a zip for the new version here
  2. Plugin upgrade A calls WP_Upgrader::unpack_package() to unpack the zip to a unique subdirectory of wp-content/upgrade/ here
  3. Plugin upgrade A deletes the old version's plugin files here. This happens because Plugin_Upgrader::upgrade() sets the clear_destination flag to true here.
  4. Plugin upgrade B downloads a zip for its new version here
  5. Plugin upgrade B calls WP_Upgrader::unpack_package() to unpack its zip here, deleting all files of plugin upgrade A from wp-content/upgrade/ here
  6. Plugin upgrade A attempts to copy the new version into the plugins dir here, but the new version's files are completely gone.
  7. Plugin upgrade A has already deleted its old version's files, and it cannot copy the new files because they no longer exist.
  8. Plugin upgrade A encounters an error, and its plugin has completely disappeared.

This upgrade behavior can also lead to upgraded plugins with missing files, following a progression similar to the one above. The only difference is that, in case of missing plugin files, plugin upgrade A succeeded in copying some files before plugin upgrade B deleted the rest of them. Here, plugin upgrade A encounters an error, and the new plugin version only has some of its files.

At Automattic, we encountered this when WordPress.com attempted to request multiple, individual plugin updates for a standalone WP site all at once via Jetpack. We encountered frequent, reproducible plugin update corruption and tracked this down via error logging added to WP core source.

NOTE: Based on this WP_Upgrader code, it might be that simultaneous, mixed plugin and theme updates can also encounter this issue, but I have not tested it.

Attachments (3)

php-errors-plugin-corruption-20210720.txt (6.3 KB) - added by bpayton 3 years ago.
PHP error logs demonstrating this issue
wp-upgrader-error-logging.diff (3.1 KB) - added by bpayton 3 years ago.
Patch WP_Upgrader to error log and show concurrent upgrade issues
avoid-corrupting-concurrent-plugin-upgrades.diff (839 bytes) - added by bpayton 3 years ago.
Patch to avoid conflicts by only deleting older, leftover upgrade files

Download all attachments as: .zip

Change History (29)

#1 @bpayton
3 years ago

Instead of deleting old upgrade files on the fly, I think wp-content/upgrade/ cleanup should probably happen as part of a focused WP-Cron event, perhaps only deleting plugin dirs that are at least X hours old. I am working on a patch to propose this.

#2 @bpayton
3 years ago

This might be the cause of #51823. The experience and log messages there are similar to those found when reproducing missing and incomplete plugins with concurrent upgrade requests.

#3 @SergeyBiryukov
3 years ago

  • Milestone changed from Awaiting Review to 5.9

This ticket was mentioned in Slack in #core-auto-updates by desrosj. View the logs.


3 years ago

@bpayton
3 years ago

PHP error logs demonstrating this issue

#5 @bpayton
3 years ago

I went back and instrumented WordPress upgrade logic with error logging that is more readable than what I did privately for myself. The logs now attached here show the following:

  1. An update failure leading to a disappearing plugin (coblocks)
  2. An update failure caused by a target directory being deleted while a zip was being unpacked to that directory (wordpress-seo). In this case, nothing was left in a broken state because the previous version was not yet deleted.

Each line of the log has a REQUEST ID at the beginning. Use that to track which request is doing what.

If you search for "coblocks" in the logs, you can see that there are two different requests that do things to coblocks. One is downloading and unpacking (REQUEST 0059), and the other is deleting the unpacked files (REQUEST 0062). The coblocks upgrade errors before it can copy any new files but after it calls clear_destination() to delete its old files. After this, the coblocks plugin was no longer present on my site.

If you search for "wordpress-seo" in the logs, you can again see two different requests working with a plugin. One is downloading and unpacking wordpress-seo (REQUEST 0061), and the other is deleting the unpacked files (REQUEST 0063). In this case, REQUEST 0061 is still unpacking wordpress-seo files when REQUEST 0063 comes along to delete the unpack destination, and REQUEST 0061 errors because it can no longer put unpacked files into the deleted destination directory.

What these logs do not show is a broken plugin with only some of its files, but I think that is just a matter of timing. If a plugin gets to start copying its new files and another deletes those new files in the middle of copying, that could result in a plugin upgrade failure with a plugin directory containing only some of its files.

Last edited 3 years ago by bpayton (previous) (diff)

#6 follow-up: @bpayton
3 years ago

Instead of deleting old upgrade files on the fly, I think wp-content/upgrade/ cleanup should probably happen as part of a focused WP-Cron event, perhaps only deleting plugin dirs that are at least X hours old. I am working on a patch to propose this.

Maybe it would be better to just add a condition so leftover wp-content/upgrade/ subdirectories are only removed if they are over X units of time old (maybe over a day old to be really conservative). It's kind of a hack, but it's also simpler than worrying about WP-Cron events for cleanup and seems very low risk.

Also, as mentioned in Slack, I will work to provide usable repro instructions for this for core as I am currently only using WordPress.com to generate concurrent plugin update requests.

#7 @bpayton
3 years ago

Here's how I am reproducing this issue with core WordPress. Thanks to @aristath for pointing out that updating inactive plugins doesn't enable maintenance mode, which might otherwise block concurrent update attempts.

First, create a script to force install out-of-date plugin versions:

#!/usr/bin/env bash

function wp_install() {
        wp --skip-plugins plugin install --force --version="$2" "$1"
}

wp_install 'amp' '2.1.2'
wp_install 'coblocks' '2.14.0'
wp_install 'layout-grid' '1.6'
wp_install 'page-optimize' '0.5.0'
wp_install 'wordpress-seo' '16.6'

# See that we have outdated plugins
wp plugin list

Then, to test:

  1. Start with a clean, default WordPress 5.8 installation and access to WP-CLI
  2. Apply the patch to add WP_Upgrader error logging
  3. Run the above script to install/downgrade outdated plugin versions
  4. Make sure those plugins are deactivated
  5. Open /wp-admin/plugins.php in your a modern web browser (I used latest Firefox)
  6. Open the browser dev tools Network tab so the browser starts collecting network requests
  7. Run this in the dev tools console to make sure plugin update links are found:
    $$('.inactive + .plugin-update-tr .update-link')
    
  8. If there are links found, run this in the dev tools console to request updates for the inactive plugins all at once:
    $$('.inactive + .plugin-update-tr .update-link')
      .map( a => a.href )
      .forEach( updateUrl => {
        fetch( updateUrl )
          .then( r => r.text() )
          .then( ( ...args ) => console.log( `${updateUrl} resolved`) )
          .catch( e => console.error( e ) )
    } )
    
  9. Examine the PHP error logs for failures
  10. When each request completes, a message should printed to the console. Once all the requests are done, you can examine each request's output for failures in the dev tools Network tab.
  11. Run wp plugin list to see if any of the plugins are still on old versions
  12. Check if any of the plugins disappeared after the upgrade attempt. This was not uncommon in my testing.

Repeat to see different outcomes. In my experience, something usually fails, but sometimes everything works OK.

If your testing goes like mine, you will see a mixture of failed upgrades, disappearing plugins, and maybe new plugin versions that are missing files.

I'm sorry these are so long. Testing this is kind of a pain. But I think these instructions should work with a default WordPress 5.8 installation.

If you're able to try this, I would love to hear how it went for you. Thanks!

Last edited 3 years ago by bpayton (previous) (diff)

@bpayton
3 years ago

Patch WP_Upgrader to error log and show concurrent upgrade issues

@bpayton
3 years ago

Patch to avoid conflicts by only deleting older, leftover upgrade files

#8 @bpayton
3 years ago

To make reproducing this easier, I added a patch to add error logging to WP_Upgrader and updated the repro instructions to use that. Each log message is prefixed with the request URI, and what is happening should be much clearer with the error logging.

I also uploaded a small patch to avoid this issue by deleting only older files from the upgrade directory.

#9 @bedas
3 years ago

Perhaps this is related https://core.trac.wordpress.org/ticket/50349

While the symptoms seem slightly different it seems rot have a similar background.

#10 @afragen
3 years ago

In unpatched WP 5.8 I ran in private browser window.

First pass got an error with AMP plugin and plugin missing at conclusion.
Second pass Yoast SEO didn't update, got a strange WP CLI error but plugin available fro update afterwards.
Third pass in new private browser window, no errors.

Sorry, not being able to get consistent errors doesn't make it easier to test. 🤨

This ticket was mentioned in Slack in #core by audrasjb. View the logs.


2 years ago

This ticket was mentioned in Slack in #core-auto-updates by audrasjb. View the logs.


2 years ago

#13 @audrasjb
2 years ago

  • Milestone changed from 5.9 to 6.0

Moving to milestone 6.0 as per today's bug scrub.

#14 @peterwilsoncc
2 years ago

@bpayton Have you tested the patch you supplied in production on WP.com or did you end up taking another approach/wait for the core fix?

#15 @bpayton
2 years ago

@peterwilsoncc yes, the patch was tested in production on a kind of standalone WP.com site (separate from the large multisite instance run by WP.com). The standalone sites, basically core WP plus a number of platform mu-plugins, are where we were seeing this issue. With the patch applied, we no longer experienced this issue, but because we prefer to run unaltered WP core releases, we did not apply the patch to all sites in production.

In the meantime, we were able to reduce this issue enough to live with it. It is a core issue but was greatly exacerbated by Jetpack making concurrent plugin update attempts. Those attempts could conflict with other Jetpack-driven plugin update attempts or core auto update attempts. We addressed this by updating Jetpack to share WP's auto_updater lock and adding some other Jetpack-specific locking to avoid conflicts.

This evening, I reran the repro instructions after you messaged and still encountered the errors mentioned in this ticket. After applying the patch, repeatedly rerunning the repro instructions yielded none of the errors.

#16 in reply to: ↑ 6 @peterwilsoncc
2 years ago

Replying to bpayton:

Instead of deleting old upgrade files on the fly, I think wp-content/upgrade/ cleanup should probably happen as part of a focused WP-Cron event, perhaps only deleting plugin dirs that are at least X hours old. I am working on a patch to propose this.

Maybe it would be better to just add a condition so leftover wp-content/upgrade/ subdirectories are only removed if they are over X units of time old (maybe over a day old to be really conservative). It's kind of a hack, but it's also simpler than worrying about WP-Cron events for cleanup and seems very low risk.

I've been thinking about this and don't think it's a wise idea.

In terms of meeting the website owners expectations: this approach would leave unexpected files on the server in an unexpected location.

If the file names are predictable then a more targeted delete while preparing for the upgrade could be a better solution.

#17 @bpayton
2 years ago

Replying to peterwilsoncc:

Replying to bpayton:
I've been thinking about this and don't think it's a wise idea.

In terms of meeting the website owners expectations: this approach would leave unexpected files on the server in an unexpected location.

@peterwilsoncc, thank you for taking time to think about this.

I agree that leaving unexpected files on a server is undesirable, but I don't believe this code bears primary responsibility for that cleanup. Here's my thinking:

In general, WordPress attempts to cleanup working files after an upgrade is done or an error occurs.

Core updates attempt to clean up working files. (refs: 1, 2, 3, 4, 5, 6, 7)

The WP_Upgrader class knows how to clean up after itself after installing a package. The WP_Upgrader::install_package() method takes a clear_working argument, and if that argument is truthy, the working files are cleaned up after installation. And it looks like clear_working is generally set to true:

  • WP_Upgrader::run() is the only method in core that calls WP_Upgrader::install_package(), and WP_Upgrader also takes a clear_working argument that defaults to true.
  • Plugin_Upgrader calls WP_Upgrader::run() three times and each call passes "clear_working" => true. (refs: 1, 2, 3)
  • Theme_Upgrader calls WP_Upgrader::run() four times and each call passes "clear_working" => true. (refs: 1, 2, 3, 4)
  • Language_Pack_Upgrader calls WP_Upgrader::run() once and passes "clear_working" => true. (ref: 1)

I believe this shows that the cleanup code in question here is a second line of defense, where WordPress takes the opportunity to clean up working files that should have been cleaned up earlier.

If that is true, I think it is probably reasonable for this second line of defense to only clean up working files that are past a certain age. It honestly seems like the safer thing to do since this code naturally does not know why the working files are there to begin with, and that lack of knowledge is the source of this bug.

Does this make sense, and if so, does it influence your thinking at all on the only-delete-older-working-files patch?

In case it helps, I think the patch could be adjusted to clean up files that are older than an hour or even a number of minutes. In retrospect, requiring files to be over a day old seems excessive.

If the file names are predictable then a more targeted delete while preparing for the upgrade could be a better solution.

Since this cleanup code assumes all pre-existing working files are left over and need to be cleaned up, I'm not sure how better naming would prevent it from removing things that are in the process of being installed. Would you be up for explaining further?

This ticket was mentioned in Slack in #core-auto-updates by afragen. View the logs.


2 years ago

This ticket was mentioned in Slack in #core-auto-updates by afragen. View the logs.


23 months ago

#20 @hellofromTonya
23 months ago

  • Milestone changed from 6.0 to 6.1

6.0 RC1 is tomorrow. As this issue was not introduced in 6.0 and discussion is ongoing, moving the ticket to 6.1.

This ticket was mentioned in Slack in #core-auto-updates by costdev. View the logs.


19 months ago

#22 @costdev
19 months ago

  • Keywords needs-testing-info added
  • Milestone changed from 6.1 to Future Release

Per the discussion in the bug scrub. As this issue still needs reliable reproduction steps and we're approaching 6.1 Beta 1, I'm moving this to the Future Release milestone and adding the needs-testing-info keyword.

Also raised in the bug scrub: This issue may be mitigated by Rollback's use of move_dir() (#51857). Once reliable reproduction steps are established, this should be tested against Rollback to see if the issue still exists.

Additional props: @afragen

#23 @bpayton
19 months ago

this issue still needs reliable reproduction steps

The existing steps reveal corruption fairly quickly, but that corruption is usually different each time.

I fear that expecting pure, repeatable reproduction will lead to this issue not being resolved.

This bug involves a race condition, so it is naturally not reproducible in the way that a pure function yields a consistent result given the same arguments. To make the results of this bug exactly repeatable, we would need to control all factors that determine which execution threads win and lose the races, and we do not typically control those factors at the OS and network level.

We have established that plugin corruption is occurring using the existing repro steps. (per my experience and @afragen's here ticket:53705#comment:10)

Of course it would be great to be able to consistently reproduce an exact broken result, but perhaps that is not 100% necessary to address this problem.

We can reason about what is occurring based on the code, patch with some fix based on our understanding, and re-run the repro instructions to see whether corruption disappears. It's not perfect, but perhaps it is a reasonable compromise.

Is there a reason exact reproducibility is needed here? Is there a reason we should believe that exact reproducibility is possible in this case?

Last edited 19 months ago by bpayton (previous) (diff)

#24 @afragen
19 months ago

@bpayton have you been able to replicate with trunk and Rollback Update Failure testing?

What I'm trying to ask, is copy_dir in some way responsible for the race condition and using rename mitigates it?

#25 @bpayton
18 months ago

@afragen, thank you for your quick reply! My apologies for not replying in kind. I am away on vacation and mostly away from the computer and haven’t been able to test this yet.

#26 @afragen
18 months ago

@bpayton when you get around to testing you will need to add this patch to core, r53578, in addition to using the current version of the Rollback Update Failure plugin.

Last edited 18 months ago by afragen (previous) (diff)
Note: See TracTickets for help on using tickets.