Make WordPress Core

Opened 2 years ago

Closed 23 months ago

#56926 closed defect (bug) (fixed)

Infinite loop in wp_nav_menu

Reported by: davidbinda's profile david.binda Owned by: hellofromtonya's profile hellofromTonya
Milestone: 6.2 Priority: normal
Severity: normal Version: 6.1
Component: Menus Keywords: has-patch has-unit-tests fixed-major
Focuses: Cc:

Description

Testing the r54478 on WordPress.com, I have noticed an infinite loop on some sites.

On certain sites, there are nav_menu_items which have the _menu_item_menu_item_parent meta set to their own ID. Eg.: nav_menu_item with ID 281 has the _menu_item_menu_item_parent meta value set to 281, which breaks the while loop in https://core.trac.wordpress.org/browser/trunk/src/wp-includes/nav-menu-template.php?rev=54478#L211 as the $menu_item_parent variable is getting set to the same, 281 value over and over.

The ultimate root cause for the infinite loop is the data corruption (of some kind), which happened at some point in the past on the sites where the infinite loop happens.

While I haven't been able to figure out how the value of the _menu_item_menu_item_parent had been set to the same value as is it's parent nav_menu_item ID, perhaps it would still be beneficial adding a safe guard to the newly introduced logic agains the infinite loop, since it's been observed on multiple sites?

I was thinking something like:

<?php
while ( $menu_item_parent && $menu_item_parent != $menu_item_key ) {

Attachments (1)

broken-menu.mp4 (4.3 MB) - added by david.binda 2 years ago.

Change History (44)

This ticket was mentioned in Slack in #core by desrosj. View the logs.


2 years ago

#2 @desrosj
2 years ago

  • Keywords reporter-feedback added

@davidbinda were you able to determine any more specifics here?

This ticket was scrubbed today during the dry run for 6.1 and this was determined non-blocking since the root cause of the issue is unclear.

This ticket was mentioned in Slack in #core by priethor. View the logs.


2 years ago

#4 @SergeyBiryukov
2 years ago

  • Milestone changed from Awaiting Review to 6.1.1

Moving to 6.1.1 for investigation. Related: #56946.

This ticket was mentioned in PR #3566 on WordPress/wordpress-develop by @mhkuu.


2 years ago
#5

  • Keywords has-patch has-unit-tests added

Co-authored-by: Hugo de Vos <hdvos93@…>

Prevents an infinite loop when the parent menu item of a menu item is itself. We do not know the root cause of this issue, but with this PR, we add a safeguard to make sure the menu is still being output. This PR also adds a unit test to confirm the output is as expected.

Trac ticket: https://core.trac.wordpress.org/ticket/56926

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


2 years ago

#7 @mhkuu
2 years ago

As part of Yoast Contributor Day, we've created a PR based on the suggestion of @davidbinda and created unit tests. However, apparently our PR does not work for PHP 5.6, so we will have to investigate that further...

#8 @david.binda
2 years ago

Sorry for the delay here. There seem to be more instances of the nav menu items which have their own ID set as their parent, thus running into the infinite loop (and ultimately either to out of memory error, or timing out a request) in the code mentioned in this ticket.

I still haven't figured out how exactly such situation happens, but have been able to pin-point that it happens in customizer on the frontend side of things.

Adding a logging for cases when the _menu_item_menu_item_parent meta is set to the same object_id, all I got were requests from customizer when updating a nav menu:

do_action('wp_ajax_customize_save'), WP_Hook->do_action, WP_Hook->apply_filters, WP_Customize_Manager->save, WP_Customize_Manager->save_changeset_post, wp_insert_post, wp_transition_post_status, do_action('transition_post_status'), WP_Hook->do_action, WP_Hook->apply_filters, _wp_customize_publish_changeset, WP_Customize_Manager->_publish_changeset_values, WP_Customize_Setting->save, WP_Customize_Nav_Menu_Item_Setting->update, wp_update_nav_menu_item, update_post_meta, update_metadata

Further, the payload, data passed in via $_POST['customized'], already contains the bad data, so the corruption, IMHO, happens on the JavaScript side of things while working in customizer.

It's still unclear to me how exactly it happens (I had no luck attempting to reproduce it), but it feels like it might be beneficial to have some more safe-guards, and perhaps even a server-side validation, for the menu_item_parent value (eg.: for it to not point to the nav menu item itself). But even if such safe-guars were added, there are still going to be such instances in the wild, so, from my point of view, we should still take such situation into consideration in any code working with the _menu_item_menu_item_parent meta.

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


2 years ago

#10 @JeffPaul
2 years ago

  • Keywords needs-refresh added

Thanks @joyously, noting here that the existing patch/PR needs a refresh as noted above.

#11 @JeffPaul
2 years ago

  • Milestone changed from 6.1.1 to Future Release

As discussed in today's 6.1.1 bug scrub, this seems unlikely to be resolved and committed by tomorrow's RC so I'm punting this to Future Release as the root cause is not yet clear.

#12 @jmdodd
2 years ago

I've seen this behavior on multiple sites as well where the value of postmeta _menu_item_menu_item_parent is the same as the post ID.

#13 @priethor
2 years ago

This issue is causing noticeable performance issues on many sites running 6.1. I strongly feel this regression should be addressed in the upcoming minor release, even if the timeline needs to be adjusted. Cc @JeffPaul @desrosj

Otherwise, a 6.1.2 minor release would be needed as soon as possible.

#14 @JeffPaul
2 years ago

I'm unable to find related reports of this issue on the support forums here on dotorg, so would be curious to learn more of the triage on WPCOM to help get to a confirmed approach on resolution for the issue here.

The 6.1.1 timeline was set to try and help get in some performance tweaks and other items that narrowly missed 6.1 and the timeline was set as @desrosj and myself have time off scheduled starting next week. Absolutely fine for a 6.1.2 to be picked up next and land in early December if that's what's needed to give time to properly investigate the issue here and land a solution that fixes the problem versus the current approach that appears to hide the problem (at least from what I can tell from comments here). Note that there likely will be another minor release not too long after 6.1.1 to help handle the admin notices that alert to the dropping of security updates for WP 3.7-4.0, so this issue here could be part of the timing on that release.

#15 @desrosj
2 years ago

It's not yet clear what causes this behavior, or what the actual scale of the issue is. How many sites are experiencing problems related to this? So far, only .com sites have reported issues. It's not clear if this is a platform specific issue, or one that has to be fixed in Core itself.

We could add some defensive handling, but it's possible there are other issues resulting from the root problem and we'd just be plastering over it.

Totally not against addressing or reverting for 6.1.1 if this is a very widely experienced issue, but based on the information provided here so far, the impact of this issue is not clear enough to recommend that.

#16 follow-up: @jmdodd
2 years ago

We've seen several sites get into infinite loops based on the values in the database when wp_nav_menu() is called -- to be clear, this isn't thousands of sites (that we know of). Because of the nature of the regression, these sites generally show up after exhausting server resources. Those that are failing for this specific reason have results for the following query:

SELECT * FROM wp_postmeta WHERE meta_key = '_menu_item_menu_item_parent' AND meta_value = post_id;

One trend is that the nav_menu_items found with this query seem to be older, with post_date_gmts of 2017-08-05 14:29:05, 2019-12-12 04:11:41, and 2016-11-02 19:47:47.

In general, I think core code should protect against infinite loops due to bad/malformed/unexpected data whenever possible because it is a minor inconvenience compared to the performance consequences of a loop.

#17 in reply to: ↑ 16 @desrosj
2 years ago

  • Milestone changed from Future Release to 6.1.2

Replying to jmdodd:
Thanks! This is helpful.

In general, I think core code should protect against infinite loops due to bad/malformed/unexpected data whenever possible because it is a minor inconvenience compared to the performance consequences of a loop.

Definitely agree. But just looking to understand this a bit more first.

Also related: #56946.

Moving this to 6.1.2 to keep it on the front burner.

#18 @hellofromTonya
2 years ago

  • Owner set to hellofromTonya
  • Status changed from new to assigned

I'm self-assigning ownership. My plan:

  • Do a deep dive into the code changes to better understand what changed and why.
  • Then look at the deeper nested menu relationship concerns.
  • Then assess what might go wrong with while() loop and potentially how to guard it. (add more test datasets)
  • If multiple things could go wrong, consider a different approach.

Some initial thoughts:

  • Should a notice be thrown instead of silently exiting the loop? Why? To avoid silently bypassing the logic without knowing why it's being bypassed. Why? The behavior is not expected. Alerting the developers or support folks can help with fixing the menu-to-[grand]parent ID issue.
  • Besides the menu item ID === its parent's ID, could it also impact when its a grandparent or higher up the tree?
  • What other side effects or unexpected behaviors when a menu item's ID is the same as its parent|grandparent|etc?

#19 @azaozz
2 years ago

Unfortunately I couldn't reproduce this either (tried pretty hard).

It's still unclear to me how exactly it happens (I had no luck attempting to reproduce it), but it feels like it might be beneficial to have some more safe-guards
...
I think core code should protect against infinite loops due to bad/malformed/unexpected data

Yes, thinking the same. Since the bad/incorrect data is in the wild, there should be safeguards in wp_nav_menu() (as the current patch). Would also be nice to validate/verify the patent ID when adding menu_item_parent meta, although not sure what should be the response if the ID is invalid. If this is reproduced, seems it would be easier to decide what to do (i.e. is this only happening for top level menu items, etc.).

#20 follow-up: @peterwilsoncc
2 years ago

As walkers already manage parenting loops, I suspect the best approach may be to revert source code changes from the previous commits: [54478,54801]. The tests can remain.

Core can then add a default filter to nav_menu_css_class to remove the child class from the final items. This has the advantage that themes wishing to remove the filter will be able to do so and retain the legacy behaviour.

<?php
function fix_56926( $classes, $menu_item, $args, $depth ) {
        // Max-depth is 1-based.
        $max_depth = (int) $args->depth;
        // Depth is 0-based so needs to be increased by one.
        $depth = $depth + 1;

        if ( 0 === $max_depth ) {
                return $classes;
        }

        if ( -1 === $max_depth || $depth >= $max_depth ) {
                // Remove the `menu-item-has-children` class.
                $classes = array_diff( $classes, array( 'menu-item-has-children' ) );
        }

        return $classes;
}
add_filter( 'nav_menu_css_class', 'fix_56926', 10, 4 );

#21 follow-up: @david.binda
2 years ago

I've been finally successful in my attempts to reproduce the issue in customizer on a standalone WordPress install on my local machine (using VVV environment).

The reproduction steps are not straightforward, so I have recorded a video of the session.

I haven't dug into the related code, but from my observations, I believe that there are 2 issues in the code somewhere, which are resulting in data corruption:

  1. removing a menu item with children does not actually remove the parent element, but sets it to "display: none" so it's no longer visible, which allows a children to be placed at depth of 1 (instead of 0) without visible parent
  2. if such an element exists, the computation of the parent in the "reorder" functionality, unpon clicking the left arrow, returns the item's own ID

In order to make sure one can follow the reproduction steps in the video attached to this comment, here is a summary of the steps in the video:

  1. Create a new menu and assign it to the primary menu location( might not be necesary )
  2. Add 3 menu items, where The first one has a single children, and one other item is at depth of 0 below the first two items
  3. Save the data
  4. Remove the first item with nested child item, move the child item to position of a nested item without parent (this IMHO should not be possible and is the bug number one mentioned above)
  5. The items gets displayed below the originally last item of the menu, but with wrong parent.
  6. Save the data and refresh the window (the refresh is needed as otherwise the left arrow is not enabled)
  7. After refresh, the reorder menu allows one to click the left arrow leading to the parent being set to the item's ID.

#22 in reply to: ↑ 20 @azaozz
2 years ago

Replying to peterwilsoncc:

the best approach may be to revert source code changes from the previous commits: [54478,54801]. The tests can remain.

Core can then add a default filter to nav_menu_css_class to remove the child class from the final items.

This is a much better solution imho, great job! Big +1.

Last edited 2 years ago by azaozz (previous) (diff)

#23 in reply to: ↑ 21 @azaozz
2 years ago

  • Keywords reporter-feedback removed

Replying to david.binda:

I've been finally successful in my attempts to reproduce the issue in customizer

Yes, can reproduce it now too, and yes, very complex steps, thank you for finding them. The video was very helpful too. Good job!

The error seems to be coming from the accessibility improvements there, and seems to only happen when attempting to move a sub-menu item to a top menu position by clicking the left arrow.

Tried to reproduce it when moving a deeper (sub-sub-menu) item one position up, but couldn't. There were also couple of notices thrown: Undefined offset: 1128 in /wp-includes/nav-menu-template.php on line 211 and line 214: https://core.trac.wordpress.org/browser/tags/6.1.1/src/wp-includes/nav-menu-template.php#L211 (1128 is the ID of the removed menu item, see step 4 above).

Thinking that this ticket should be closed with the fix for the infinite loop as described by @peterwilsoncc above. Then two new tickets can be opened: one to verify menu_item_parent on saving, and remove it if it matches the menu ID. Would be good if this is added to 6.1.2 if possible. Then another ticket to fix the bug in the Customizer. After a quick look this doesn't seem straightforward and may take some time.

Last edited 2 years ago by azaozz (previous) (diff)

This ticket was mentioned in PR #3649 on WordPress/wordpress-develop by @peterwilsoncc.


2 years ago
#24

  • Keywords needs-refresh removed

#25 @peterwilsoncc
2 years ago

In the linked pull request:

  • Partial revert of [54478,54801], the tests for each are retained.
  • Introduced a default filter to remove the item from bottom level items.
  • No additional tests as yet as I haven't looked too deeply at the reproduction steps.

@azaozz #57122 exists for the undefined offset error. The tests I've written need to be improved as they are currently passing.

@azaozz commented on PR #3649:


2 years ago
#26

Testing this with the steps from https://core.trac.wordpress.org/ticket/56926#comment:21, it prevents the infinite loop in all cases I could think of.

However noticed that when the menu_item's ID is the same as menu_item's parent, the menu-item-has-children class is still added and not removed even when there are no children. Trying to fix that thinking the code should reset $menu_item->menu_item_parent when it is the same as $menu_item->ID. This will prevent the particular case of "bad data" coming from the Customizer error. Code would probably be something like:

// Set up the $menu_item variables.
_wp_menu_item_classes_by_context( $menu_items );

$sorted_menu_items        = array();
$menu_items_with_children = array();

foreach ( (array) $menu_items as $menu_item ) {
	// Try to fix wrong `menu_item_parent`, see: https://core.trac.wordpress.org/ticket/56926.
	if ( (int) $menu_item->ID === (int) $menu_item->menu_item_parent ) {
		$menu_item->menu_item_parent = 0;
	}

	$sorted_menu_items[ $menu_item->menu_order ] = $menu_item;

	if ( $menu_item->menu_item_parent ) {
		$menu_items_with_children[ $menu_item->menu_item_parent ] = true;
	}
}

#27 @azaozz
2 years ago

Related: #57169, #57170.

Last edited 2 years ago by azaozz (previous) (diff)

@azaozz commented on PR #3649:


2 years ago
#28

LGTM, +1 to commit.

#29 @peterwilsoncc
2 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 54999:

Menus: Prevent infinite loop in menus.

This modifies how the menu-item-has-children class is removed from bottom level menu items. Instead of removing the class within wp_nav_menu() a filter is applied to the nav_menu_css_class hook to remove the class as required.

Introduces wp_nav_menu_remove_menu_item_has_children_class() for removing the class.

Reverts source code changes in [54478,54801], the tests are retained.

Props davidbinda, SergeyBiryukov, mhkuu, JeffPaul, jmdodd, priethor, desrosj, hellofromTonya, azaozz, peterwilsoncc.
Fixes #56926.
See #28620.

#30 @peterwilsoncc
2 years ago

  • Keywords fixed-major added
  • Resolution fixed deleted
  • Status changed from closed to reopened

Reopening for merging in to the 6.1 branch.

@peterwilsoncc commented on PR #3566:


2 years ago
#32

Discussing this we @azaozz it was decided to go with an alternative approach to reduce the chance of further issues looping through the menu item tree.

I've just committed this in https://core.trac.wordpress.org/changeset/54999 / https://github.com/WordPress/wordpress-develop/commit/1a8535075613b29e750a3b966db4e2004dfad48d so will close this pull request off.

Thanks for your work on this patch, although it wasn't included it is appreciated and you received props in the commit for the ticket.

#33 follow-up: @dd32
2 years ago

Some themes appear to be running into a fatal error here, for example these lines are causing fatals:

E_ERROR: Uncaught ArgumentCountError: Too few arguments to function wp_nav_menu_remove_menu_item_has_children_class(), 3 passed in wp-includes/class-wp-hook.php on line 308 and exactly 4 expected in wp-includes/nav-menu-template.php:649
E_ERROR: Uncaught ArgumentCountError: Too few arguments to function wp_nav_menu_remove_menu_item_has_children_class(), 2 passed in wp-includes/class-wp-hook.php on line 308 and exactly 4 expected in wp-includes/nav-menu-template.php:649

https://themes.trac.wordpress.org/browser/hestia/3.0.24/inc/class-hestia-bootstrap-navwalker.php?marks=69#L59
https://themes.trac.wordpress.org/browser/lightning/15.2.1/_g3/inc/class-vk-description-walker.php?marks=14#L4
https://themes.trac.wordpress.org/browser/melos/1.0.1/admin/main/options/00.theme-setup.php?marks=257#L247 (Somewhere in there, I'm not sure where, but it triggers the 2 passed error above)

edit: those themes are obviously using the core filter for their own menu purposes, for compat, but I guess this code could at least expect the 4th param to be not-set

Last edited 2 years ago by dd32 (previous) (diff)

#34 @peterwilsoncc
2 years ago

Thanks @dd32, I'll take a quick look.

Would you prefer me to revert or are you happy to pin .org to the prior commit for a while?

#35 @dd32
2 years ago

@peterwilsoncc Take your time. wp-themes.com is going to get pinned for now.

#38 @peterwilsoncc
2 years ago

In the newly linked pull request:

  • Account for the legacy format filter by applying default values for the $args and $depth parameters
  • If these values are not supplied (the filter is called in legacy mode) the new function does nothing as it has no idea what the maximum and current depths are.
  • See the testing mu-plugin I've put together -- it's rough but will help demonstrate the error on 2013

I'll add some test with class to apply_filters in legacy mode.

@mhkuu commented on PR #3566:


2 years ago
#39

Thanks @peterwilsoncc, looks like a more fail-safe approach indeed! 👍

#40 @peterwilsoncc
2 years ago

In 55005:

Menus: Account for legacy calls to nav_menu_css_class filter.

Modify wp_nav_menu_remove_menu_item_has_children_class() to account for changes to the nav_menu_css_class filter since it's introduction.

The $args and $depth parameters were added after the filter's introduction so this protects against fatal errors in custom walkers applying the filter in a legacy format.

Without the $args or $depth parameters, wp_nav_menu_remove_menu_item_has_children_class() no longer attempts to remove the menu-item-has-children from the lowest level menu items as these are required to determine the current branch the walker is walking.

Follow up to [54999].

Props dd32, azaozz, peterwilsoncc.
See #56926, #28620.

#42 @SergeyBiryukov
23 months ago

  • Milestone changed from 6.1.2 to 6.2.1

Moving to 6.2.1, as there are no plans for 6.1.2 at this time.

#43 @SergeyBiryukov
23 months ago

  • Milestone changed from 6.2.1 to 6.2
  • Resolution set to fixed
  • Status changed from reopened to closed

This is already included in the 6.2 branch as of [55504], no backport needed.

Note: See TracTickets for help on using tickets.