Make WordPress Core

Opened 3 years ago

Closed 4 weeks ago

#59239 closed defect (bug) (fixed)

wp_generate_uuid4 collisions

Reported by: joppuyo's profile joppuyo Owned by: jorbin's profile jorbin
Milestone: 7.0 Priority: normal
Severity: normal Version:
Component: General Keywords: has-patch needs-testing commit
Focuses: Cc:

Description

It seems like wp_generate_uuid4() is prone to creating UUID collisions, since it internally uses mt_rand, which uses a 32-bit seed. When this seed repeats, it will generate the same UUID twice. Maybe it should be updated to use wp_rand instead which uses random_int so it's backed by a real CSPRNG?

Change History (19)

#1 @kaushiksomaiya
3 months ago

We've noted this while investigating a support case - specific server/PHP config was (suspectedly) causing UUID to be duplicated at least once between a few random requests. And highlighting another report from @thanasisxan here https://developer.wordpress.org/reference/functions/wp_generate_uuid4/

Last edited 3 months ago by kaushiksomaiya (previous) (diff)

This ticket was mentioned in PR #10694 on WordPress/wordpress-develop by @johnbillion.


3 months ago
#2

  • Keywords has-patch added

Trac ticket: #59239

#3 @johnbillion
3 months ago

  • Milestone changed from Awaiting Review to 7.0

#4 follow-up: @johnbillion
3 months ago

One concern I have is that with this change in place, wp_generate_uuid4() now relies on a pluggable function. This means it would break any code that calls wp_generate_uuid4() before the plugins_loaded hook. What's the likelihood of this? Low, but perhaps a plugin somewhere is generating a UUID for each request for logging purposes.

#5 @joppuyo
3 months ago

I created this ticket too because I observed some collisions when generating UUIDs with wp_generate_uuid4(). According to PHP documentation, after 80,000 iterations the chance of a collision is 50%. I tested this myself by generating UUIDs using this function in a loop and after few minutes I managed to generate a collision around 80,000 iterations.

It seems like wp_rand() is a wrapper around random_int() and it only uses a fallback if the random_int() throws an exception (no random source available) or the function is not available. As it was added in PHP 7.0 that's probably not relevant in current WordPress versions as they don't support PHP 5 any longer.

I think it could be possible to just use random_int() directly if wp_rand() is not available. If that's too much duplicate code it might be possible to drop wp_rand() completely and just use random_int() so the function works without the dependency on pluggable functions being loaded.

Last edited 3 months ago by joppuyo (previous) (diff)

This ticket was mentioned in Slack in #core by juanmaguitar. View the logs.


5 weeks ago

#7 @juanmaguitar
5 weeks ago

  • Keywords needs-testing added

From today's bug scrub

There's a PR #10694 opened that has been approved. I'll add the needs-testing keyword

#8 @alexodiy
4 weeks ago

Tested PR #10694 on PHP 8.5.1 (NTS, Windows x64) against trunk at r62029.

Environment

  • PHP: 8.5.1 (cli) NTS Visual C++ 2022 x64
  • WordPress: 7.0-beta5 (trunk r62029)
  • OS: Windows 11

Reproduction

Demonstrated the collision problem by generating UUIDs with mt_rand() using repeated seeds. Since mt_rand() uses a 32-bit seed, identical seeds produce identical UUID sequences.

Before patch (mt_rand):

mt_srand(12345);
$uuid1 = wp_generate_uuid4(); // 51e22de5-fd1d-4881-8da4-ada90ffe517e

mt_srand(12345);
$uuid2 = wp_generate_uuid4(); // 51e22de5-fd1d-4881-8da4-ada90ffe517e

// $uuid1 === $uuid2 → collision

Generated 131,072 UUIDs with 65,536 sequential seeds across 2 rounds. Result: 65,536 unique UUIDs, 65,536 collisions (every UUID from round 2 collided with round 1).

After patch (wp_rand / random_int):

Generated 100,000 UUIDs using random_int() (which wp_rand() delegates to). Result: 100,000 unique UUIDs, 0 collisions.

UUID4 format validation

Verified 1,000 UUIDs generated with the patched function:

  • All 1,000 match the UUID4 regex ^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$
  • Version nibble (position 13) is always 4
  • Variant nibble (position 17) is always 8, 9, a, or b

Bitwise operations | 0x4000 and | 0x8000 are correctly preserved after switching from mt_rand() to wp_rand().

Results

Scenario Before Patch (mt_rand) After Patch (wp_rand)
Same seed produces same UUID Yes (collision) N/A (CSPRNG, no seed control)
131K UUIDs with repeated seeds 65,536 collisions 0 collisions in 100K
UUID4 format valid Yes Yes
Version/variant bits correct Yes Yes

Patch is minimal and correct. No regressions. +1 for commit.

#9 in reply to: ↑ 4 ; follow-up: @peterwilsoncc
4 weeks ago

Replying to johnbillion:

One concern I have is that with this change in place, wp_generate_uuid4() now relies on a pluggable function. This means it would break any code that calls wp_generate_uuid4() before the plugins_loaded hook. What's the likelihood of this? Low, but perhaps a plugin somewhere is generating a UUID for each request for logging purposes.

This could be protected against with:

<?php
$randomizer = function_exists( 'wp_rand' ) ? 'wp_rand' : 'mt_rand';
//...
$randomizer( 0, 0xffff );

I agree it's a low possibility, it can only really affect mu-plugins, plugins using the plugin_loaded (singular) hook and plugins running bootstrapping code upon file inclusion rather than a hook.

This ticket was mentioned in Slack in #core by audrasjb. View the logs.


4 weeks ago

#11 @audrasjb
4 weeks ago

  • Owner set to jorbin
  • Status changed from new to assigned

Reassigning to @jorbin as per today's bug scrub.

#12 @jorbin
4 weeks ago

  • Keywords commit added

Updated based on @peterwilsoncc's suggestion. Planning to commit soon.

#13 in reply to: ↑ 9 @siliconforks
4 weeks ago

Isn't mt_rand the problem here, though? Wouldn't it be better to use random_int instead?

#14 follow-up: @jorbin
4 weeks ago

@siliconforks mt_rand is a fallback in the scenario where someone is using wp_generate_uuid4 extremely early. If random_int would to be used, the logic in wp_rand would need to be duplicated including, a fallback for when random_int cannot find an appropriate source of randomness.

#15 in reply to: ↑ 14 @siliconforks
4 weeks ago

Replying to jorbin:

@siliconforks mt_rand is a fallback in the scenario where someone is using wp_generate_uuid4 extremely early. If random_int would to be used, the logic in wp_rand would need to be duplicated including, a fallback for when random_int cannot find an appropriate source of randomness.

Well, I think the safest thing to do in that case would be to just throw the exception rather than keep handing out bogus UUIDs. But if you really want to guarantee that wp_generate_uuid4() never throws an exception and always returns something, you could simply

  1. Use wp_rand() if that function exists
  2. If wp_rand() is not available, try calling random_int()
  3. If random_int() throws an exception, catch it and fall back to mt_rand()

#16 follow-up: @peterwilsoncc
4 weeks ago

@siliconforks I don't think it's worth the additional logic.

The risk of collision with mt_rand() is present but very low. The updated function will only ever fire if a third party developers calls it somewhere between the mu-plugins includes and before the plugins_loaded hook. Which itself is rare.

Running the following using the existing (collision prone) code about tenish didn't result in any collisions.

<?php
function gguid_collision_test() {
        $gguids = array();
        $count = 0;

        $tries = 1000000;

        for ( $i = 0; $i < $tries; $i++ ) {
                $gguid = wp_generate_uuid4();
                if ( isset( $gguids[ $gguid ] ) ) {
                        echo "Collision detected after $count tries.\n";
                        return;
                }
                $gguids[ $gguid ] = true;
                $count++;
        }

        echo "No collisions detected after $count tries.\n";
}

echo '<pre>';
gguid_collision_test();
exit;

As this isn't using mt_rand for cryptographic purposes, I think it's fine to use the KISS approach for a low^2 probability.

#17 in reply to: ↑ 16 @siliconforks
4 weeks ago

Replying to peterwilsoncc:

Running the following using the existing (collision prone) code about tenish didn't result in any collisions.

I don't think you would ever get a collision that way, because the Mersenne Twister has a very long period. The problem is that the seed is only 32 bits.

As this isn't using mt_rand for cryptographic purposes,

Well, it's a library function, so you don't really know what callers are using it for...

I think it's fine to use the KISS approach for a low^2 probability.

I think KISS is fine too - I would just call random_int() and then if it throws an exception, let the caller decide what to do with it.

#18 @jorbin
4 weeks ago

The problem with throwing an exception is that it breaks websites. It wasn't a lot of extra logic to try random_int and only if that is also not available fallback to mt_rand so I implemented that.

#19 @jorbin
4 weeks ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 62054:

General: Use functions that are more random to reduce likelihood of UUID collisions.

mt_rand produces not fully random numbers which makes it so wp_generate_uuid4 was more likely to produce a uuid which collides with another uuid it produced. This attempts to make those collisions much less likely.

Since wp_rand is a pluggable function, it's not loaded until after plugins have been loaded. In order to make it so this function can still be used early, it falls back first to random_int, which will throw an exception if it can't find an appropriate source of randomness, and then to the existing, but flawed, mt_rand.

Props johnbillion, peterwilsoncc, westonruter, mukesh27, siliconforks, alexodiy, juanmaguitar, audrasjb, joppuyo, jorbin.
Fixes #59239.

Note: See TracTickets for help on using tickets.