Make WordPress Core

Opened 10 years ago

Closed 9 years ago

Last modified 9 years ago

#31767 closed defect (bug) (fixed)

insert_with_markers() is not atomic, leading to corrupted .htaccess updates under race conditions

Reported by: tigertech's profile tigertech Owned by: dd32's profile dd32
Milestone: 4.4 Priority: normal
Severity: critical Version: 4.1.1
Component: Filesystem API Keywords: needs-patch
Focuses: Cc:

Description

My company has occasionally seen corrupted .htaccess files on WordPress sites we host. When it happens, there appears to be a part of the .htaccess file that has been deleted out of the middle.

I believe it happens when two different processes simultaneously call the WordPress insert_with_markers() function in "wp-admin/includes/misc.php". The function does not atomically rewrite the file, and the second process can see a half-rewritten copy of the first one. This can be triggered by plugins that use flush_rewrite_rules() to rewrite the file if two such requests occur simultaneously. I've been able to reproduce this behavior with a simple test case:

Create a file called "corruption.php" in the top level of a WordPress site:

<?php
require( dirname( __FILE__ ) . '/wp-blog-header.php' );
require_once( dirname( __FILE__ )  . '/wp-admin/includes/misc.php' );
$array = array();
for($i = 1; $i <= 100; $i++) { $array[$i] = "Test line $i pid " . getmypid(); }
for($i = 1; $i <= 5000; $i++) { insert_with_markers( dirname( __FILE__ )  . '/corruption.test', 'CorruptionTest', $array ); }
print("done");

Load <http://www.example.com/corruption.php> from a single browser connection. The resulting "corruption.test" file should look fine.

Then load <http://www.example.com/corruption.php> from multiple browser connections simultaneously. This simulates two WordPress HTTP requests writing the file simultaneously. In my testing, "corruption.test" is corrupted much of the time. (Try it repeatedly if it doesn't appear corrupted at first.)

This happens because insert_with_markers() does the following:

1) Reads the existing file with "file( $filename )"

2) Truncates the existing file with "fopen( $filename, 'w' )"

3) Appends to the file one line at a time in a loop with "fwrite()"

4) Calls "fclose()"

However, if a second copy of insert_with_markers() executes step 1 while the first copy is in the middle of the step 3 loop, it will load an incomplete copy of the existing file, and the written copy will be corrupted.

To avoid this problem, insert_with_markers() could use PHP's "tmpfile()" to write to a temporary file, then rename it after closing it.

Attachments (1)

misc.patch (2.5 KB) - added by tigertech 10 years ago.
Patch for insert_with_markers

Download all attachments as: .zip

Change History (28)

#1 @DrewAPicture
10 years ago

  • Component changed from Rewrite Rules to Filesystem API

@tigertech
10 years ago

Patch for insert_with_markers

#2 @tigertech
10 years ago

  • Keywords has-patch added

The "misc.patch" file I uploaded solves this in most cases. It has two improvements:

1) If there's an existing file and the target directory is writable, it saves it to a temporary file and then atomically renames it. This always avoids the corruption.

2) If either of those conditions isn't true, it collects all the writes into one variable and then writes everything in a single fwrite, as soon as possible after the fopen. This minimizes the amount of the time that another process can see an incomplete file, and usually avoids the corruption.

#3 @kevinatelement
10 years ago

I wonder, would it make sense to use flock() (http://php.net/flock), to further improve on number 2?

#4 @tigertech
10 years ago

Using flock would help on installations where both the directory is not writable and NFS is not in use. I would expect the number of those to be pretty small, though.

(I came back to this bug because my company had another .htaccess file corrupted because of this today. Roughly speaking, for every thousand sites we host, we're seeing about one of these corruptions per year. It usually affects sites that use WooCommerce, which tries to rewrite the .htaccess file fairly often. Since it's a data corruption bug, perhaps the severity should be increased?)

#5 follow-up: @dd32
9 years ago

I think it'd be safer to simply use flock() and/or file_put_contents() with LOCK_EX specified.

Some filesystems won't support locking, but this would cover a bunch of them.
Really, no system should be writing to the .htaccess file often, if there's a plugin which modifies the rules regularly (especially often enough to cause corruption at present) I think it should be considered a bug report for that plugin..

#6 @rmccue
9 years ago

Also experienced this on a live site (my own). Seems like straight-up file locking would probably be the best solution here.

#7 in reply to: ↑ 5 @tigertech
9 years ago

Replying to dd32:

I think it'd be safer to simply use flock() and/or file_put_contents() with LOCK_EX specified.
Some filesystems won't support locking, but this would cover a bunch of them.

Right: flock() would indeed solve the .htaccess corruption problem for my customers, since we don't use NFS. (I found some reports that file_put_contents() with LOCK_EX can cause complete write failures on NFS, so I'd stick with flock().)

And it would be a simpler patch, obviously. The reason I didn't suggest that originally is that some hosting companies do use NFS, and flock() wouldn't fix the problem there.

Even if we use flock() to ensure that the .htaccess file isn't corrupted during simultaneous calls to insert_with_markers(), there's still another problem here, even without NFS. Apache may encounter an empty or partially-written .htaccess file after insert_with_markers() truncates it, but before it finishes all the fwrite()s. This is not as bad as a permanently corrupted file, but Apache may temporarily do the wrong thing with an HTTP request as a result.

A tempfile and atomic rename is the only solution I know of that avoids both the possibility of corrupted files on NFS, and of Apache occasionally seeing an empty or partially-written .htaccess file.

If we do use flock() instead of an atomic rename, I would also still suggest delaying the file open/truncate and fwrite as long as possible, by accumulating the data to be written into a variable, then writing it to the file in a single write at the end, as my patch attempted. This will minimize the occurrence of the NFS corruption problem and the Apache half-written file problem, by making the "partially written" duration of the file as short as possible.

Really, no system should be writing to the .htaccess file often, if there's a plugin which modifies the rules regularly (especially often enough to cause corruption at present) I think it should be considered a bug report for that plugin..

I completely agree that nothing should be doing this. It's horribly inefficient and pointless. But this still seems like it should be fixed in WordPress core; two plugins or themes using a documented interface in an inefficient and pointless manner shouldn't cause data corruption.

I'd be glad to submit a patch that uses flock(), together with accumulating data into a variable to minimize the partially-written file duration, if that is more likely to be accepted than the previous patch?

#8 @tigertech
9 years ago

Is there anything I can do to encourage a fix for this? I was hoping it would be in 4.3, but it doesn't seem to be on that track. Since it causes corruption of the .htaccess file, I'm tempted to increase the severity to "critical", but I'm not sure if that's appropriate.

As I said, I'm willing to work up a new patch using flock() if that would be accepted, or I can rework the original patch if you have specific concerns about it, etc.

Thanks for your time!

#9 @tigertech
9 years ago

  • Severity changed from normal to critical

This bug a) completely stops WordPress from working by corrupting the .htaccess file, and b) causes data loss in some cases (erasing non-WordPress data from the .htaccess file), so I'm marking it "critical". I hope this will bring more attention to it. I apologize if this step is inappropriate, but I don't know what else to do.

#10 @kevinatelement
9 years ago

I can confirm that this is a serious issue that manifests in production environments.

We ran into it a couple times over a 18 month period, before digging in deep to determine the cause.

We have a workaround in place which disables WordPress from updating the .htaccess entirely, but it's a pretty ugly workaround due to the fact that the only hook available is got_mod_rewrite and to always return false there would affect permalink structures. So we need to peek into the call stack to decide what the context of the call to got_mod_rewrite was so we can prevent insert_with_markers.

In case anyone else is interested in this workaround, here it is:

add_filter('got_rewrite', 'turn_off_rewrite');
function turn_off_rewrite() {
	$backtrace = debug_backtrace();
	foreach($backtrace as $trace) {
		if(isset($trace['function']) && $trace['function'] == 'save_mod_rewrite_rules') {
			return false;
		}
	}
	return true;
}

#11 @SergeyBiryukov
9 years ago

  • Keywords 4.4-early added
  • Milestone changed from Awaiting Review to Future Release

#12 @dd32
9 years ago

  • Milestone changed from Future Release to 4.4

Lets make this happen.

This ticket was mentioned in Slack in #core by sergey. View the logs.


9 years ago

#14 @SergeyBiryukov
9 years ago

  • Owner set to dd32
  • Status changed from new to assigned

#15 @dd32
9 years ago

In 34740:

Rewrite insert_with_markers() to use flock() when available, significant cleanup of the function too.

The call to flock() is an exclusive advisory lock, which in my testing only PHP respects (apache continues to read it).
Not all filesystems support locking (remote NFS mounts for example) so this offers minimal benefit to those platforms, but offers much better protection against file corruption on systems which do support it.
The call is blocking, so a second process will wait for the first to complete before writing if supported.

See #31767

#16 @dd32
9 years ago

[34740] is my attempt at handling this using flock().

So far in testing, I'm able to consistently duplicate file corruption before the patch, and also when locking isn't available.
Unfortunately we can't use a second temporary file in a lot of cases, so I decided against using an atomic file move.

Leaving this open to see if there's any other suggestions we could go down, or if any bugs crop up (please test)

#17 @willmot
9 years ago

This change causes a broken unit test in one of our plugins.

Previously insert_with_markers would create the file if it didn't already exist, now it simply returns false if the file doesn't exist.

This was perhaps an unintended side effect / feature caused by our use of fopen( $filename, 'w' );. Switching to fopen( $filename, 'r+' ); means the file is no longer created if it doesn't exist. Also the function now bails right at the top if the file isn't writable whereas previously it would only bail if file_exists() && ! is_writable().

Here's the where we're calling insert_with_markers: https://github.com/humanmade/backupwordpress/blob/master/classes/class-path.php#L258

Here's what the function used to look like: https://github.com/WordPress/WordPress/blob/4.3.1/wp-admin/includes/misc.php#L90-L149

We can easily fixup our plugin so it doesn't rely on this behaviour, but wanted to raise as a bahaviour change anyhow.

#18 @tigertech
9 years ago

I've tested this, and the good news is that the patch appears to solve the corruption problem in the vast majority of cases.

On systems that support flock(), it should completely solve it; on systems that don't, the code cleanup of "writing everything all at once" will make it much less likely to happen (the new code is also noticeably far faster when calling insert_with_markers thousands of times in a test, reducing the amount of time a race condition could exist).

However, as @willmot noticed, it has a new bug: it doesn't create a missing file any more, and it needs to. Perhaps the fix for that could be as simple as adding this at the top of the function:

if (!file_exists( $filename ) ) {
    if(!touch( $filename )) {
        return false;
    }
}
Last edited 9 years ago by tigertech (previous) (diff)

#19 @tigertech
9 years ago

Since we're changing this, I have one more suggestion that would reduce the chance of corruption even more.

In all the cases where I saw corruption "in the wild", a plugin or theme was constantly (and unwisely) rewriting the same data over and over on every page view. The parameters being passed were the same each time, so the actual contents of the file were identical after insert_with_markers() was called.

It would be useful to not write the file at all if it won't change. Perhaps it could figure out what the original contents look like, then compare the old and new, and skip the write if they're identical. Something like:

// Compare the new and old data to see if a write is necessary
$old_file_data = implode( "\n", array_merge( $lines ));
if ( $old_file_data === $new_file_data ) {
    flock( $fp, LOCK_UN );
    fclose( $fp );
    return true;
} else {
    // Write to the start of the file, and truncate it to that length

(Well, not literally like that, because duplicating the flock and fclose would be ugly and stupid code, but you get the idea.)

I tested this and it seemed to work fine, avoiding rewrites of identical data but correctly rewriting changed data. Avoiding most writes should avoid most potential race conditions on systems where flock isn't supported.

#20 @kevinatelement
9 years ago

I noticed that in this changeset, the fopen call lost the leading @. fopen will throw an E_WARNING when the fopen fails, so I expect it might be desired to keep it as @fopen.

I also wonder if it would be best practice to call fflush($fp) before flock( $fp, LOCK_UN )

And finally, I wonder if it wouldn't be best to wait until right before the fseek( $fp, 0 ) to make the blocking flock( $fp, LOCK_EX ) call. Isn't it a best practice to hold a blocking lock for the minimum amount of time you can?

#21 @SergeyBiryukov
9 years ago

  • Keywords needs-patch added; has-patch removed

Needs a new patch, see comment:17.

#22 follow-up: @dd32
9 years ago

Thanks @kevinatelement @tigertech @willmot

Previously insert_with_markers would create the file if it didn't already exist, now it simply returns false if the file doesn't exist.

#oops :) Can't believe I missed that..

Ideally I guess I'd have used c+ (PHP 5.2.6+) for opening the file (Read/Write/Create/Seek to start) instead of r+(Read/Write/Seek to start) but touch() works too.

It would be useful to not write the file at all if it won't change. Perhaps it could figure out what the original contents look like, then compare the old and new, and skip the write if they're identical.

With the locking strategy currently in use (see below) this only gives a minor speed improvement, but it seems like a good option.

I noticed that in this changeset, the fopen call lost the leading @. fopen will throw an E_WARNING when the fopen fails, so I expect it might be desired to keep it as @fopen.

I've left it, as the pre-checks should prevent that being the case.

I also wonder if it would be best practice to call fflush($fp) before flock( $fp, LOCK_UN )

Certainly can't do any harm, and is probably the best since PHP buffers at 8K. There's a chance another thread could lock the file after closing the lock and closing the file handler (which flushes).

And finally, I wonder if it wouldn't be best to wait until right before the fseek( $fp, 0 ) to make the blocking flock( $fp, LOCK_EX ) call. Isn't it a best practice to hold a blocking lock for the minimum amount of time you can?

This is a tough one.
Currently it locks the file while it's working to ensure that all writes DO make it to the file. Ie. ProcessA and ProcessB's writes will both end up in the file (if they're different). But if ProcessC ... Z are also trying to write to the file, then it's going to cause a traffic jam.
If we were to only lock during the write phase, then ProcessA and ProcessB if altering different $markers would cause ProcessA's changes to be lost.
Another option would be to use LOCK_NB (non-blocking) and detect an existing lock, if one is already set then bail from the function entirely causing the second writes to fail.

Personally I'm thinking of leaving it as-is and blocking and waiting, because if you're writing on every pageload you're doing bad things.

#23 @dd32
9 years ago

In 35267:

In insert_with_markers() restore the 4.3 behaviour of creating the file if it doesn't exist.
This change also makes it bail early (without writing) if the markers content is the same as the existing, and uses ftell() rather than $bytes for the location to truncate the file to - based on the file pointer being at the end of the written stream.

Props willmot tigertech kevinatelement
See #31767

#24 in reply to: ↑ 22 @tigertech
9 years ago

I've tested this latest change and it appears to work fine. Thanks!

Personally I'm thinking of leaving it as-is and blocking and waiting, because if you're writing on every pageload you're doing bad things.

I agree; it's more correct. The "two different things are writing two different markers" is actually a likely occurrence -- the data loss I've seen usually happens when two different mistaken "write on every page view" plugins/themes run simultaneously.

#25 @dd32
9 years ago

  • Keywords 4.4-early removed
  • Resolution set to fixed
  • Status changed from assigned to closed

I'm marking this as fixed, please re-open if an issue if found during 4.4 beta cycle.

#26 follow-up: @chrishang001
9 years ago

Hi guys,

Could I clarify that this fix is in place only in v4.4 beta or is it present in the current version 4.3.1?

C

#27 in reply to: ↑ 26 @tigertech
9 years ago

Replying to chrishang001:

Could I clarify that this fix is in place only in v4.4 beta or is it present in the current version 4.3.1?

The fix is not in 4.3.1. It will be in 4.4.

Note: See TracTickets for help on using tickets.