Make WordPress Core

Opened 6 weeks ago

Last modified 2 weeks ago

#56926 assigned defect (bug)

Infinite loop in wp_nav_menu

Reported by: davidbinda's profile david.binda Owned by: hellofromtonya's profile hellofromTonya
Milestone: 6.1.2 Priority: normal
Severity: normal Version: 6.1
Component: Menus Keywords: has-patch has-unit-tests
Focuses: Cc:

Description

Testing the r54478 on WordPress.com, I have noticed an infinite loop on some sites.

On certain sites, there are nav_menu_items which have the _menu_item_menu_item_parent meta set to their own ID. Eg.: nav_menu_item with ID 281 has the _menu_item_menu_item_parent meta value set to 281, which breaks the while loop in https://core.trac.wordpress.org/browser/trunk/src/wp-includes/nav-menu-template.php?rev=54478#L211 as the $menu_item_parent variable is getting set to the same, 281 value over and over.

The ultimate root cause for the infinite loop is the data corruption (of some kind), which happened at some point in the past on the sites where the infinite loop happens.

While I haven't been able to figure out how the value of the _menu_item_menu_item_parent had been set to the same value as is it's parent nav_menu_item ID, perhaps it would still be beneficial adding a safe guard to the newly introduced logic agains the infinite loop, since it's been observed on multiple sites?

I was thinking something like:

<?php
while ( $menu_item_parent && $menu_item_parent != $menu_item_key ) {

Attachments (1)

broken-menu.mp4 (4.3 MB) - added by david.binda 3 weeks ago.

Change History (28)

This ticket was mentioned in Slack in #core by desrosj. View the logs.


5 weeks ago

#2 @desrosj
5 weeks ago

  • Keywords reporter-feedback added

@davidbinda were you able to determine any more specifics here?

This ticket was scrubbed today during the dry run for 6.1 and this was determined non-blocking since the root cause of the issue is unclear.

This ticket was mentioned in Slack in #core by priethor. View the logs.


5 weeks ago

#4 @SergeyBiryukov
5 weeks ago

  • Milestone changed from Awaiting Review to 6.1.1

Moving to 6.1.1 for investigation. Related: #56946.

This ticket was mentioned in PR #3566 on WordPress/wordpress-develop by @mhkuu.


5 weeks ago
#5

  • Keywords has-patch has-unit-tests added

Co-authored-by: Hugo de Vos <hdvos93@…>

Prevents an infinite loop when the parent menu item of a menu item is itself. We do not know the root cause of this issue, but with this PR, we add a safeguard to make sure the menu is still being output. This PR also adds a unit test to confirm the output is as expected.

Trac ticket: https://core.trac.wordpress.org/ticket/56926

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


5 weeks ago

#7 @mhkuu
5 weeks ago

As part of Yoast Contributor Day, we've created a PR based on the suggestion of @davidbinda and created unit tests. However, apparently our PR does not work for PHP 5.6, so we will have to investigate that further...

#8 @david.binda
5 weeks ago

Sorry for the delay here. There seem to be more instances of the nav menu items which have their own ID set as their parent, thus running into the infinite loop (and ultimately either to out of memory error, or timing out a request) in the code mentioned in this ticket.

I still haven't figured out how exactly such situation happens, but have been able to pin-point that it happens in customizer on the frontend side of things.

Adding a logging for cases when the _menu_item_menu_item_parent meta is set to the same object_id, all I got were requests from customizer when updating a nav menu:

do_action('wp_ajax_customize_save'), WP_Hook->do_action, WP_Hook->apply_filters, WP_Customize_Manager->save, WP_Customize_Manager->save_changeset_post, wp_insert_post, wp_transition_post_status, do_action('transition_post_status'), WP_Hook->do_action, WP_Hook->apply_filters, _wp_customize_publish_changeset, WP_Customize_Manager->_publish_changeset_values, WP_Customize_Setting->save, WP_Customize_Nav_Menu_Item_Setting->update, wp_update_nav_menu_item, update_post_meta, update_metadata

Further, the payload, data passed in via $_POST['customized'], already contains the bad data, so the corruption, IMHO, happens on the JavaScript side of things while working in customizer.

It's still unclear to me how exactly it happens (I had no luck attempting to reproduce it), but it feels like it might be beneficial to have some more safe-guards, and perhaps even a server-side validation, for the menu_item_parent value (eg.: for it to not point to the nav menu item itself). But even if such safe-guars were added, there are still going to be such instances in the wild, so, from my point of view, we should still take such situation into consideration in any code working with the _menu_item_menu_item_parent meta.

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


4 weeks ago

#10 @JeffPaul
4 weeks ago

  • Keywords needs-refresh added

Thanks @joyously, noting here that the existing patch/PR needs a refresh as noted above.

#11 @JeffPaul
4 weeks ago

  • Milestone changed from 6.1.1 to Future Release

As discussed in today's 6.1.1 bug scrub, this seems unlikely to be resolved and committed by tomorrow's RC so I'm punting this to Future Release as the root cause is not yet clear.

#12 @jmdodd
3 weeks ago

I've seen this behavior on multiple sites as well where the value of postmeta _menu_item_menu_item_parent is the same as the post ID.

#13 @priethor
3 weeks ago

This issue is causing noticeable performance issues on many sites running 6.1. I strongly feel this regression should be addressed in the upcoming minor release, even if the timeline needs to be adjusted. Cc @JeffPaul @desrosj

Otherwise, a 6.1.2 minor release would be needed as soon as possible.

#14 @JeffPaul
3 weeks ago

I'm unable to find related reports of this issue on the support forums here on dotorg, so would be curious to learn more of the triage on WPCOM to help get to a confirmed approach on resolution for the issue here.

The 6.1.1 timeline was set to try and help get in some performance tweaks and other items that narrowly missed 6.1 and the timeline was set as @desrosj and myself have time off scheduled starting next week. Absolutely fine for a 6.1.2 to be picked up next and land in early December if that's what's needed to give time to properly investigate the issue here and land a solution that fixes the problem versus the current approach that appears to hide the problem (at least from what I can tell from comments here). Note that there likely will be another minor release not too long after 6.1.1 to help handle the admin notices that alert to the dropping of security updates for WP 3.7-4.0, so this issue here could be part of the timing on that release.

#15 @desrosj
3 weeks ago

It's not yet clear what causes this behavior, or what the actual scale of the issue is. How many sites are experiencing problems related to this? So far, only .com sites have reported issues. It's not clear if this is a platform specific issue, or one that has to be fixed in Core itself.

We could add some defensive handling, but it's possible there are other issues resulting from the root problem and we'd just be plastering over it.

Totally not against addressing or reverting for 6.1.1 if this is a very widely experienced issue, but based on the information provided here so far, the impact of this issue is not clear enough to recommend that.

#16 follow-up: @jmdodd
3 weeks ago

We've seen several sites get into infinite loops based on the values in the database when wp_nav_menu() is called -- to be clear, this isn't thousands of sites (that we know of). Because of the nature of the regression, these sites generally show up after exhausting server resources. Those that are failing for this specific reason have results for the following query:

SELECT * FROM wp_postmeta WHERE meta_key = '_menu_item_menu_item_parent' AND meta_value = post_id;

One trend is that the nav_menu_items found with this query seem to be older, with post_date_gmts of 2017-08-05 14:29:05, 2019-12-12 04:11:41, and 2016-11-02 19:47:47.

In general, I think core code should protect against infinite loops due to bad/malformed/unexpected data whenever possible because it is a minor inconvenience compared to the performance consequences of a loop.

#17 in reply to: ↑ 16 @desrosj
3 weeks ago

  • Milestone changed from Future Release to 6.1.2

Replying to jmdodd:
Thanks! This is helpful.

In general, I think core code should protect against infinite loops due to bad/malformed/unexpected data whenever possible because it is a minor inconvenience compared to the performance consequences of a loop.

Definitely agree. But just looking to understand this a bit more first.

Also related: #56946.

Moving this to 6.1.2 to keep it on the front burner.

#18 @hellofromTonya
3 weeks ago

  • Owner set to hellofromTonya
  • Status changed from new to assigned

I'm self-assigning ownership. My plan:

  • Do a deep dive into the code changes to better understand what changed and why.
  • Then look at the deeper nested menu relationship concerns.
  • Then assess what might go wrong with while() loop and potentially how to guard it. (add more test datasets)
  • If multiple things could go wrong, consider a different approach.

Some initial thoughts:

  • Should a notice be thrown instead of silently exiting the loop? Why? To avoid silently bypassing the logic without knowing why it's being bypassed. Why? The behavior is not expected. Alerting the developers or support folks can help with fixing the menu-to-[grand]parent ID issue.
  • Besides the menu item ID === its parent's ID, could it also impact when its a grandparent or higher up the tree?
  • What other side effects or unexpected behaviors when a menu item's ID is the same as its parent|grandparent|etc?

#19 @azaozz
3 weeks ago

Unfortunately I couldn't reproduce this either (tried pretty hard).

It's still unclear to me how exactly it happens (I had no luck attempting to reproduce it), but it feels like it might be beneficial to have some more safe-guards
...
I think core code should protect against infinite loops due to bad/malformed/unexpected data

Yes, thinking the same. Since the bad/incorrect data is in the wild, there should be safeguards in wp_nav_menu() (as the current patch). Would also be nice to validate/verify the patent ID when adding menu_item_parent meta, although not sure what should be the response if the ID is invalid. If this is reproduced, seems it would be easier to decide what to do (i.e. is this only happening for top level menu items, etc.).

#20 follow-up: @peterwilsoncc
3 weeks ago

As walkers already manage parenting loops, I suspect the best approach may be to revert source code changes from the previous commits: [54478,54801]. The tests can remain.

Core can then add a default filter to nav_menu_css_class to remove the child class from the final items. This has the advantage that themes wishing to remove the filter will be able to do so and retain the legacy behaviour.

<?php
function fix_56926( $classes, $menu_item, $args, $depth ) {
        // Max-depth is 1-based.
        $max_depth = (int) $args->depth;
        // Depth is 0-based so needs to be increased by one.
        $depth = $depth + 1;

        if ( 0 === $max_depth ) {
                return $classes;
        }

        if ( -1 === $max_depth || $depth >= $max_depth ) {
                // Remove the `menu-item-has-children` class.
                $classes = array_diff( $classes, array( 'menu-item-has-children' ) );
        }

        return $classes;
}
add_filter( 'nav_menu_css_class', 'fix_56926', 10, 4 );

#21 follow-up: @david.binda
3 weeks ago

I've been finally successful in my attempts to reproduce the issue in customizer on a standalone WordPress install on my local machine (using VVV environment).

The reproduction steps are not straightforward, so I have recorded a video of the session.

I haven't dug into the related code, but from my observations, I believe that there are 2 issues in the code somewhere, which are resulting in data corruption:

  1. removing a menu item with children does not actually remove the parent element, but sets it to "display: none" so it's no longer visible, which allows a children to be placed at depth of 1 (instead of 0) without visible parent
  2. if such an element exists, the computation of the parent in the "reorder" functionality, unpon clicking the left arrow, returns the item's own ID

In order to make sure one can follow the reproduction steps in the video attached to this comment, here is a summary of the steps in the video:

  1. Create a new menu and assign it to the primary menu location( might not be necesary )
  2. Add 3 menu items, where The first one has a single children, and one other item is at depth of 0 below the first two items
  3. Save the data
  4. Remove the first item with nested child item, move the child item to position of a nested item without parent (this IMHO should not be possible and is the bug number one mentioned above)
  5. The items gets displayed below the originally last item of the menu, but with wrong parent.
  6. Save the data and refresh the window (the refresh is needed as otherwise the left arrow is not enabled)
  7. After refresh, the reorder menu allows one to click the left arrow leading to the parent being set to the item's ID.

#22 in reply to: ↑ 20 @azaozz
3 weeks ago

Replying to peterwilsoncc:

the best approach may be to revert source code changes from the previous commits: [54478,54801]. The tests can remain.

Core can then add a default filter to nav_menu_css_class to remove the child class from the final items.

This is a much better solution imho, great job! Big +1.

Last edited 3 weeks ago by azaozz (previous) (diff)

#23 in reply to: ↑ 21 @azaozz
3 weeks ago

  • Keywords reporter-feedback removed

Replying to david.binda:

I've been finally successful in my attempts to reproduce the issue in customizer

Yes, can reproduce it now too, and yes, very complex steps, thank you for finding them. The video was very helpful too. Good job!

The error seems to be coming from the accessibility improvements there, and seems to only happen when attempting to move a sub-menu item to a top menu position by clicking the left arrow.

Tried to reproduce it when moving a deeper (sub-sub-menu) item one position up, but couldn't. There were also couple of notices thrown: Undefined offset: 1128 in /wp-includes/nav-menu-template.php on line 211 and line 214: https://core.trac.wordpress.org/browser/tags/6.1.1/src/wp-includes/nav-menu-template.php#L211 (1128 is the ID of the removed menu item, see step 4 above).

Thinking that this ticket should be closed with the fix for the infinite loop as described by @peterwilsoncc above. Then two new tickets can be opened: one to verify menu_item_parent on saving, and remove it if it matches the menu ID. Would be good if this is added to 6.1.2 if possible. Then another ticket to fix the bug in the Customizer. After a quick look this doesn't seem straightforward and may take some time.

Last edited 3 weeks ago by azaozz (previous) (diff)

This ticket was mentioned in PR #3649 on WordPress/wordpress-develop by @peterwilsoncc.


3 weeks ago
#24

  • Keywords needs-refresh removed

#25 @peterwilsoncc
3 weeks ago

In the linked pull request:

  • Partial revert of [54478,54801], the tests for each are retained.
  • Introduced a default filter to remove the item from bottom level items.
  • No additional tests as yet as I haven't looked too deeply at the reproduction steps.

@azaozz #57122 exists for the undefined offset error. The tests I've written need to be improved as they are currently passing.

@azaozz commented on PR #3649:


2 weeks ago
#26

Testing this with the steps from https://core.trac.wordpress.org/ticket/56926#comment:21, it prevents the infinite loop in all cases I could think of.

However noticed that when the menu_item's ID is the same as menu_item's parent, the menu-item-has-children class is still added and not removed even when there are no children. Trying to fix that thinking the code should reset $menu_item->menu_item_parent when it is the same as $menu_item->ID. This will prevent the particular case of "bad data" coming from the Customizer error. Code would probably be something like:

// Set up the $menu_item variables.
_wp_menu_item_classes_by_context( $menu_items );

$sorted_menu_items        = array();
$menu_items_with_children = array();

foreach ( (array) $menu_items as $menu_item ) {
	// Try to fix wrong `menu_item_parent`, see: https://core.trac.wordpress.org/ticket/56926.
	if ( (int) $menu_item->ID === (int) $menu_item->menu_item_parent ) {
		$menu_item->menu_item_parent = 0;
	}

	$sorted_menu_items[ $menu_item->menu_order ] = $menu_item;

	if ( $menu_item->menu_item_parent ) {
		$menu_items_with_children[ $menu_item->menu_item_parent ] = true;
	}
}

#27 @azaozz
2 weeks ago

Related: #57169, #57170.

Last edited 2 weeks ago by azaozz (previous) (diff)
Note: See TracTickets for help on using tickets.