WordPress.org

Make WordPress Core

Opened 3 months ago

Last modified 2 weeks ago

#51857 reopened enhancement

Add rollback for failed plugin/theme updates

Reported by: pbiron Owned by: pbiron
Milestone: 5.8 Priority: normal
Severity: normal Version:
Component: Upgrade/Install Keywords: early needs-patch
Focuses: Cc:

Description (last modified by pbiron)

When a core auto-update fails, core has long had the ability to automatically attempt a "rollback" to the latest stable release (in the branch that the site is running). Note: for core auto-updates, "rollback" is not attempted for certain failures (e.g. "disk full", etc).

This capability should be extended to be available for plugin and theme updates.

[note: originally this description mentioned `auto-updates`...that was a mistake on my part. This is about rollback for both `manual` and `auto-updates`]

The current functionality for core rollbacks relies on the availability of a "rollback" package being returned by the Updates API. It is unclear (to me, at this time) whether extending this functionality to plugins/themes will require changes to the API or not.

Important: This ticket is not about backup/restore in case a site admin later determines that a successful plugin/theme update actually causes problem on a site!!!! Rather it is about covering cases such as #51823, where a plugin/theme update failure happens between the time the existing plugin/theme is deleted but before the new version successfully installed.

Attachments (6)

51857.diff (513 bytes) - added by afragen 3 months ago.
In copy_dir() check $wp_filesystem->dirlist() and return WP_Error if false
51857.2.diff (1.7 KB) - added by afragen 3 months ago.
only test on first pass
51857.3.diff (1.7 KB) - added by afragen 3 months ago.
pass slug in WP_Error instead of FS_METHOD
51857.4.diff (5.9 KB) - added by afragen 3 months ago.
add rollback
51857.5.diff (7.7 KB) - added by afragen 3 months ago.
also update _copy_dir() per @dd32 rec
51857.6.diff (7.6 KB) - added by afragen 8 weeks ago.
bubble up error for extract_rollback

Download all attachments as: .zip

Change History (79)

#1 @pbiron
3 months ago

  • Owner set to pbiron

#2 @afragen
3 months ago

Just a quick thought. The plugin/theme update transient could have an additional element containing the most recent download package. Then some error during the update process might be able to attempt an update using the alternate download package.

This ticket was mentioned in Slack in #core-auto-updates by audrasjb. View the logs.


3 months ago

This ticket was mentioned in Slack in #core-auto-updates by pbiron. View the logs.


3 months ago

#5 @pbiron
3 months ago

  • Description modified (diff)
  • Summary changed from Add rollback for failed plugin/theme auto-updates to Add rollback for failed plugin/themeupdates

#6 @pbiron
3 months ago

  • Summary changed from Add rollback for failed plugin/themeupdates to Add rollback for failed plugin/theme updates

#7 @afragen
3 months ago

It seems that some additional error reporting is needed. In referencing the core failure data provided by @dd32 in https://wordpress.slack.com/archives/CULBN711P/p1606869594134200 it seems that copy_dir() is a focal point of failure.

In digging a little if $wp_filesystem->dirlist( $from ) is false what happens is the contents of the plugin folder is deleted and the folder remains.

I added a patch to test for this and return a WP_Error. At a minimum it can give us more information. Perhaps it can mitigate this type of failure.

Last edited 3 months ago by afragen (previous) (diff)

@afragen
3 months ago

In copy_dir() check $wp_filesystem->dirlist() and return WP_Error if false

This ticket was mentioned in Slack in #core-auto-updates by afragen. View the logs.


3 months ago

@afragen
3 months ago

only test on first pass

#9 @afragen
3 months ago

Only running of the first pass of copy_dir() allows plugins to empty subfolders.

This ticket was mentioned in Slack in #core-auto-updates by afragen. View the logs.


3 months ago

#11 @audrasjb
3 months ago

@afragen indeed this would at least prevent the updater to delete plugin without being able to upload the new version.

I tested it by changing writing permissions on my test install and it looks good to me (plugin not deleted). Is it the best way to test this patch?

#12 @afragen
3 months ago

@audrasjb part of the issue is that I don't even know if this is issue. It does seem to solve this particular problem ( $dirlist returning false ).

I test by settings a breakpoint after $dirlist = $wp_filesystem->dirlist( $from ); in copy_dir() and then setting $dirlist = false; in VSCode.

I'm working on sending this failure data to dot org but I have no way to tell if they get it. I'll make another ticket for that.

Last edited 3 months ago by afragen (previous) (diff)

#13 @afragen
3 months ago

See #51928 as possible method of data collection by dot org.

@afragen
3 months ago

pass slug in WP_Error instead of FS_METHOD

#14 @afragen
3 months ago

The Automatic Updater Skin doesn't really pass any specific data about plugin/theme so I'm passing the slug in WP_Error as something to help.

Version 0, edited 3 months ago by afragen (next)

This ticket was mentioned in Slack in #core by afragen. View the logs.


3 months ago

#16 @afragen
3 months ago

  • Keywords has-patch dev-feedback needs-testing added

51857.4.diff added rollback of plugin/theme to patch.

To test copy_dir() must return a WP_Error. This would likely happen if $dirlist = false.

@afragen
3 months ago

add rollback

@afragen
3 months ago

also update _copy_dir() per @dd32 rec

This ticket was mentioned in Slack in #core by afragen. View the logs.


3 months ago

#18 @lukecarbis
3 months ago

  • Keywords early added

#19 @poena
3 months ago

During today's scrub,
@afragen suggested that:

Easiest way to test is install patch, set a breakpoint in copy_dir() after $dirlist assignment and set $dirlist = false; This will return WP_Error and trigger the rollback during a plugin/theme update.

Last edited 3 months ago by poena (previous) (diff)

This ticket was mentioned in Slack in #core-auto-updates by audrasjb. View the logs.


3 months ago

This ticket was mentioned in Slack in #core by audrasjb. View the logs.


2 months ago

This ticket was mentioned in Slack in #core-auto-updates by hellofromtonya. View the logs.


2 months ago

#23 follow-up: @TimothyBlynJacobs
2 months ago

I'm not sure if zipping a copy is the right approach here. In my experience, most sites that have issues with a partial update failure ( ie some files are not copied resulting in a crash ) are due to resources timing out during the upgrade process. By taking a zip, this will significantly increase the amount of processing that needs to be done during the upgrade process and thus may increase the likelihood of failed updates. Further, since this is a fatal error issue, copy_dir won't return a result and the rollback code won't execute in that case.

#24 @afragen
2 months ago

ATM this is not quite meant to be all encompassing. Some of the issue is getting a WP_Error out of copy_dir().

It is meant to function if the $dirlist returns false as this is clearly an issue likely relating to the download, extraction , copy process. There's another ticket #51928 to help provide more data for where and when failures happen.

The zipping is done before the any unpacking of the downloaded package and I'm not sure that it should increase the server resources significantly. If this can be demonstrated that would be great. But since the failure seems to be in copy_dir(), zipping and unzipping the current plugin seems like a better solution than copying the data somewhere and then copying it back.

Last edited 2 months ago by afragen (previous) (diff)

#25 @afragen
2 months ago

I did some quick time tests locally and it takes almost no time to create a zip even of something as large as Jetpack.

This ticket was mentioned in Slack in #core by metalandcoffee. View the logs.


2 months ago

#27 @afragen
2 months ago

It would be helpful to test this on slow servers to see if the zipping process introduces any potential timeout issues.

Please post results here.

Last edited 2 months ago by afragen (previous) (diff)

#28 @audrasjb
2 months ago

I successfully tested the above patch on the worst server I can access to: 1 vCore, RAM 2Gb, 20 Gb SATA SSD. That's one of the cheapest services provided by big hosting companies here in France.
I'll try to find a non-SSD server next week.

But FWIW: rollback worked :-)

This ticket was mentioned in Slack in #core by afragen. View the logs.


2 months ago

This ticket was mentioned in Slack in #core-auto-updates by audrasjb. View the logs.


2 months ago

This ticket was mentioned in Slack in #hosting-community by pbiron. View the logs.


2 months ago

This ticket was mentioned in Slack in #hosting-community by crixu. View the logs.


2 months ago

#33 in reply to: ↑ 23 ; follow-up: @mikeschroder
2 months ago

Replying to TimothyBlynJacobs:

I'm not sure if zipping a copy is the right approach here. In my experience, most sites that have issues with a partial update failure ( ie some files are not copied resulting in a crash ) are due to resources timing out during the upgrade process. By taking a zip, this will significantly increase the amount of processing that needs to be done during the upgrade process and thus may increase the likelihood of failed updates. Further, since this is a fatal error issue, copy_dir won't return a result and the rollback code won't execute in that case.

This was brought up in the hosting community channel for discussion and testing, but wanted to leave a quick note that this is also my experience.

#34 follow-up: @afragen
8 weeks ago

Mike, are you saying you have seen/experienced this problem or that you agree it's a theoretical concern?

This ticket was mentioned in Slack in #hosting-community by jadonn. View the logs.


8 weeks ago

#36 in reply to: ↑ 33 @a2hosting
8 weeks ago

I'd just like to add in that we've seen what should be very small zip tasks called from PHP fail if there are resource issues and that should be avoided if at all possible in cases such as these.

#37 @afragen
8 weeks ago

Can we please try to obtain actual data? Let's try to test the actual patch.

I understand theoretical issues, but I also understand that one type of zip task is not the same as another.

This ticket was mentioned in Slack in #core by audrasjb. View the logs.


8 weeks ago

This ticket was mentioned in Slack in #core-auto-updates by afragen. View the logs.


8 weeks ago

#40 @afragen
8 weeks ago

From the description.

The current functionality for core rollbacks relies on the availability of a "rollback" package being returned by the Updates API. It is unclear (to me, at this time) whether extending this functionality to plugins/themes will require changes to the API or not.

Obtaining the currently installed version package from dot org is possible but attempting to reinstall it in the same manner as the failure doesn't seem like the practical way forward.

This approach also has the issue of not working with any plugin/theme not hosted on dot org. I think this is a hard block.

#41 @afragen
8 weeks ago

Lots of discussion between @TimothyBlynJacobs and myself, https://wordpress.slack.com/archives/CULBN711P/p1609968242405800

My impression of the conclusion is that if there is a server resource issue that occurs a result of the zip creation process then a simple page reload brings the site back to it's previous state showing a plugin/theme update being available.

This shouldn't be an issue as long as the frequency of this happening is rare. If this is happens with small plugins it's an issue.

If it happens with large plugins like Mailpoet, Jetpack, Yoast SEO, etc. then perhaps it's not ideal to run a resource intensive site on a resource poor shared environment.

#42 @TimothyBlynJacobs
8 weeks ago

My impression of the conclusion is that if there is a server resource issue that occurs a result of the zip creation process then a simple page reload brings the site back to it's previous state showing a plugin/theme update being available.

I think this is accurate. But the failure happening _during_ the zip process specifically is not my big worry, but that it consumes resources so that when the file transfer happens, the site no longer has enough resources to complete the upgrade. Which is how you can end up with a fatal erroring site.

@afragen
8 weeks ago

bubble up error for extract_rollback

This ticket was mentioned in PR #860 on WordPress/wordpress-develop by afragen.


8 weeks ago

Moving previous patch(es) to GitHub.

  • Updates copy_dir() and _copy_dir() to include WP_Error return early if the $dirlist is empty.
  • Adds function zip_to_rollback_dir() which creates a zip of the current plugin/theme that is being updated.
  • Adds function extract_rollback() which on a WP_Error returned from copy_dir() will extract the above zip back into the appropriate location.

Trac ticket: https://core.trac.wordpress.org/ticket/51857

This ticket was mentioned in Slack in #core-auto-updates by audrasjb. View the logs.


7 weeks ago

This ticket was mentioned in Slack in #core by audrasjb. View the logs.


7 weeks ago

This ticket was mentioned in Slack in #core-auto-updates by audrasjb. View the logs.


6 weeks ago

This ticket was mentioned in Slack in #core by afragen. View the logs.


6 weeks ago

#48 in reply to: ↑ 34 @mikeschroder
6 weeks ago

Replying to afragen:

Mike, are you saying you have seen/experienced this problem or that you agree it's a theoretical concern?

Sorry, I missed this question before. I'm saying I've seen and experienced this problem.
It is not a theoretical concern.

I agree that data would be helpful here, whether from the WordPress project, or hosts.

I'll ask around a bit and see if I can find recent data. It is one of those things that I helped with a lot in tech support, but that was some time ago.

#49 follow-ups: @SergeyBiryukov
6 weeks ago

Thanks for the patch!

It looks like there are some concerns about performance implications of building (and then potentially unpacking) a zip file with the current plugin version, especially for large plugins. Instead of building a zip file, could we rename the plugin directory to something reasonably unique like plugin-slug.__rollback__, and then delete it on successful update, or rename back on failure? I think that would not require any additional resources. Creating a zip file might save some disk space, but if the space is already low enough for that to make a difference, then it seems like updates on that site would start failing anyway sooner rather than later.

Personally, I would like the code for plugin/theme upgraders to be as close to core as possible. At a glance, it looks like Core_Upgrader::upgrade() has a lot more than Plugin_Upgrader::upgrade() or Theme_Upgrader::upgrade(), so perhaps unifying these methods and documenting the differences would be a good first step. I would prefer to adapt the core upgrader for plugins and themes as appropriate, and not create a completely different implementation.

#50 @afragen
6 weeks ago

@SergeyBiryukov there’s a fundamental problem with trying to use the same method for a rollback that is the cause of the failure in the first place. In this case copy_dir().

#51 in reply to: ↑ 49 @afragen
6 weeks ago

Replying to SergeyBiryukov:

Personally, I would like the code for plugin/theme upgraders to be as close to core as possible. At a glance, it looks like Core_Upgrader::upgrade() has a lot more than Plugin_Upgrader::upgrade() or Theme_Upgrader::upgrade(), so perhaps unifying these methods and documenting the differences would be a good first step. I would prefer to adapt the core upgrader for plugins and themes as appropriate, and not create a completely different implementation.

Sounds like this should be another ticket.

#52 in reply to: ↑ 49 ; follow-up: @afragen
6 weeks ago

Replying to SergeyBiryukov:

Thanks for the patch!

It looks like there are some concerns about performance implications of building (and then potentially unpacking) a zip file with the current plugin version, especially for large plugins. Instead of building a zip file, could we rename the plugin directory to something reasonably unique like plugin-slug.__rollback__, and then delete it on successful update, or rename back on failure? I think that would not require any additional resources. Creating a zip file might save some disk space, but if the space is already low enough for that to make a difference, then it seems like updates on that site would start failing anyway sooner rather than later.

Thanks for the feedback.

Unfortunately rename() can be flaky/unreliable and fallback from rename() involves file copy which is the likely cause of the failure in the first place. Without testing the patch are we looking to solve a problem that doesn't exist?

A server timeout issue is mostly fixed in a page reload and the site isn't likley any worse off than it currently would be.

Isn't this what alpha is for? Unless there's some other concern?

#53 @TimothyBlynJacobs
6 weeks ago

A server timeout issue is mostly fixed in a page reload and the site isn't likley any worse off than it currently would be.

Calling this out again.

But the failure happening _during_ the zip process specifically is not my big worry, but that it consumes resources so that when the file transfer happens, the site no longer has enough resources to complete the upgrade. Which is how you can end up with a fatal erroring site.

This ticket was mentioned in Slack in #core-auto-updates by hellofromtonya. View the logs.


6 weeks ago

#55 in reply to: ↑ 52 @dd32
6 weeks ago

Replying to SergeyBiryukov:

It looks like there are some concerns about performance implications [...] could we rename the plugin directory to something reasonably unique like plugin-slug.__rollback__, and then delete it on successful update, or rename back on failure?

This looks like the best option from a filesystem-io perspective to me.

Replying to afragen:

Unfortunately rename() can be flaky/unreliable and fallback from rename() involves file copy which is the likely cause of the failure in the first place.

I'm not sure what the flaky/unreliable experience is here, but I feel like creating a ZIP backup is solving a problem that doesn't actually exist while adding significant CPU/IO into the process at the same time that's not needed.

Looking at the filesystem handlers move(), it looks like the Direct IO one has some behaviour to use a copy, which _probably_ isn't needed and is probably not useful, [13001] added the usage of rename() into it. A lot has changed in the last decade host IO wise.

Also of note, is that the FTP variants don't check the overwrite functionality.. but that's mostly irrelevant here.

I would probably suggest just cleaning up those move() methods to be consistent and not fallback on a copy() if it's not actually performing a move()/removing the source directory.

IMHO if move() fails, either the delete of the existing directory will also fail, or rollback should just be skipped for those edge-cases.

#56 follow-ups: @dd32
6 weeks ago

An alternative option would be to flip this around, and not add a rollback functionality into it, but rather install into a temporary directory first, and then remove existing/rename new into place.

(edit: I guess that's kind of how it works now, except the temporary install directory is wp-content/upgrade/somethinghere. It's the copy from there to the plugins directory that sometimes fails, so avoiding that copy by extracting into the plugins directory in the first place and then performing a move() would mean far less IO is needed and as a result could be faster too)

That would result in less downtime of the plugin not existing on the site, and also not delete the plugin from the site in the event of a failure to create the new files.

eg:

  • Plugin update into .hello-dolly.upgrade.1234..
  • Once successful:
    • hello-dolly directory is deleted.
    • move( '.hello-dolly.upgrade.1234', 'hello-dolly' );
    • return true;
  • Upon failure:
    • Delete .hello-dolly.upgrade.1234
    • return false
Last edited 6 weeks ago by dd32 (previous) (diff)

#57 in reply to: ↑ 56 @dd32
6 weeks ago

Replying to dd32:

I guess that's kind of how it works now, except the temporary install directory is wp-content/upgrade/somethinghere. It's the copy from there to the plugins directory that sometimes fails

I think this might have originally been a copy, rather than a move, as moving items cross-partition isn't possible. Sometimes each site would have it's own wp-content directory, but the wp-content/plugins or themes folder was actually a symlink to another disk or some such.

I could also be mis-remembering it, and it was purely bad planning and development skills that I wrote it like this :) If I was doing it again, I'd do it how I've outlined in comment 56 above

Last edited 6 weeks ago by dd32 (previous) (diff)

This ticket was mentioned in Slack in #core-auto-updates by hellofromtonya. View the logs.


6 weeks ago

This ticket was mentioned in Slack in #core by hellofromtonya. View the logs.


6 weeks ago

#60 in reply to: ↑ 56 @hellofromTonya
6 weeks ago

From core scrub today, @afragen noted that he'll open a new ticket to address the copy_dir changes Dion mentioned above.

For this ticket though, just want to make sure we all understand. Replying to dd32:

An alternative option would be to flip this around, and not add a rollback functionality into it, but rather install into a temporary directory first, and then remove existing/rename new into place.

Hey @dd32 > Is the proposed implementation how it currently works?

If yes, then nothing else needs to be done in this ticket.

Last edited 6 weeks ago by hellofromTonya (previous) (diff)

#61 @hellofromTonya
6 weeks ago

  • Keywords close added

Marking as a close candidate. Why? Our understanding is that Dion's proposed solution does not require any additional work in this ticket. And it resolves the server resources concerns.

#62 @SergeyBiryukov
6 weeks ago

It's my understanding that comment:56 is not exactly how it currently works and still needs to be implemented, so I don't think the ticket should be closed yet.

#63 @hellofromTonya
6 weeks ago

  • Keywords close removed

This ticket was mentioned in Slack in #core by lukecarbis. View the logs.


5 weeks ago

This ticket was mentioned in Slack in #core by hellofromtonya. View the logs.


5 weeks ago

#66 @hellofromTonya
5 weeks ago

  • Milestone 5.7 deleted
  • Resolution set to maybelater
  • Status changed from assigned to closed

Closing this ticket for now as a maybelater. Why?

  • #52342 is the new direction proposed by Dion.
  • #52381 proposes a new filter to enable moving this ticket into a feature plugin.

#67 @dd32
5 weeks ago

Just noting, that while this has been closed as maybelater, #52342 is not the suggestion as I mentioned above, and #52381 is a suggestion for a filter to make this ticket possible as a feature plugin, which could've just been kept here.

#68 @SergeyBiryukov
4 weeks ago

  • Milestone set to 5.8
  • Resolution maybelater deleted
  • Status changed from closed to reopened

Reopening per comment:62 and comment:67.

#69 @SergeyBiryukov
4 weeks ago

  • Keywords needs-patch added; has-patch dev-feedback needs-testing removed

I think comment:56 is the preferred direction here.

This ticket was mentioned in Slack in #core by sergey. View the logs.


4 weeks ago

This ticket was mentioned in Slack in #hosting-community by javier. View the logs.


3 weeks ago

This ticket was mentioned in Slack in #hosting-community by jadonn. View the logs.


3 weeks ago

This ticket was mentioned in Slack in #core-auto-updates by afragen. View the logs.


2 weeks ago

Note: See TracTickets for help on using tickets.