Make WordPress Core

Opened 3 years ago

Last modified 23 months ago

#53910 reviewing defect (bug)

`sanitize_title_with_dashes` returns partial encoded values in permalink

Reported by: costdev's profile costdev Owned by: sergeybiryukov's profile SergeyBiryukov
Milestone: Future Release Priority: normal
Severity: major Version: 5.8
Component: Permalinks Keywords: has-patch has-unit-tests dev-feedback
Focuses: ui, rtl, administration Cc:

Description (last modified by SergeyBiryukov)

Picked up quite an old bug (circa 2006!) while working on #47912.

sanitize_title_with_dashes() does a check to see if the title seems_utf8() and subsequently url encodes it. The call to utf8_uri_encode() has a $length argument of 200.

If an encoded value crosses the 200 boundary, the encoded value is cut and the remainder isn't picked up by any of the subsequent actions taken by sanitize_title_with_dashes().

this-very-long-title-is-to-help-demonstrate-that-partial-encoded-values-remain-when-you-try-to-use-sanitize-title-with-dashes-on-encoded-strings-trimmed-to-200-chars-instead-of-using-max-and-strlen%e2%80%af

becomes:

this-very-long-title-is-to-help-demonstrate-that-partial-encoded-values-remain-when-you-try-to-use-sanitize-title-with-dashes-on-encoded-strings-trimmed-to-200-chars-instead-of-using-max-and-strlen%e2

I've resolved this issue by:

  • Storing the seems_utf8() value
  • Changing the call to utf8_uri_encode(), so that the length argument is the max() of strlen( $title ) and 200
  • Trimming the $title to 200 at the end of the sanitize_title_with_dashes instead.

Attachments (1)

53910_example.jpg (48.7 KB) - added by costdev 3 years ago.
Example of issue

Download all attachments as: .zip

Change History (14)

@costdev
3 years ago

Example of issue

#1 @SergeyBiryukov
3 years ago

  • Description modified (diff)

#2 @audrasjb
3 years ago

  • Keywords needs-patch added

Thanks for opening this ticket, @costdev !

  • Trimming the $title to 200 at the end of the sanitize_title_with_dashes instead.

Looks like the easier option here. With such a length, I don't see how it could hurt to cut off titles with more than 200 characters when performing the url sanitization.

This ticket was mentioned in PR #1569 on WordPress/wordpress-develop by costdev.


3 years ago
#3

  • Keywords has-patch has-unit-tests added; needs-patch removed

sanitize_title_with_dashes() does a check to see if the title seems_utf8() and subsequently url encodes it.

The call to utf8_uri_encode() has a $length argument of 200.

If an encoded value crosses the 200 boundary, the encoded value is cut and the remainder isn't picked up by any of the subsequent actions taken by sanitize_title_with_dashes().

Resolved by:

  • storing the seems_utf8() value.
  • changing the call to utf8_uri_encode(), so that the length argument is the max() of strlen( $title ) and 200.
  • trimming the $title to 200 at the end of sanitize_title_with_dashes() instead.

Trac ticket: https://core.trac.wordpress.org/ticket/53910

#4 @costdev
3 years ago

Thanks @audrasjb!

I've just updated the PR/patch to:

  • remove the $title_seems_utf8 variable
  • remove the $length argument in the original call utf8_uri_encode()
  • trim any title that seems_utf8() to 200 at the end of sanitize_title_with_dashes()

PHPUnit tests are successful.

As the issue only occurred with seems_utf8() titles that were trimmed too early, I've only made the change in this circumstance to prevent the fix interfering with unknown potential uses of 200+ character title slugs.

Last edited 3 years ago by costdev (previous) (diff)

#5 @SergeyBiryukov
3 years ago

  • Milestone changed from Awaiting Review to 5.9
  • Owner set to SergeyBiryukov
  • Status changed from new to reviewing

#6 follow-up: @audrasjb
3 years ago

Thanks! The patch (with unit tests, thanks!) looks great to me.
I'll add a dev-feedback workflow keyword, just to get another thoughts on the implementation (trim to 200 chars), but it looks like the safer option to me.

Aside, I feel a bit hesitant to move it to the next minor milestone. I don't think it could break anything, but I'm not sure it could not be considered as a breaking change since it changes the behavior of the URL sanitization…

#7 @audrasjb
3 years ago

  • Keywords dev-feedback added

#8 in reply to: ↑ 6 @costdev
3 years ago

Replying to audrasjb:

Aside, I feel a bit hesitant to move it to the next minor milestone. I don't think it could break anything, but I'm not sure it could not be considered as a breaking change since it changes the behavior of the URL sanitization…

Absolutely, it is a breaking change, in certain circumstances. I've also now realised that this bug doesn't just occur with encoded substrings like %e2%80%93, but with all sanitized substrings including © that exist on either side of the 200 character boundary.


Before applying the patch, create a post with this title:

This very long title is to help demonstrate that partial encoded values remain when you try to use sanitize title with dashes on encoded strings trimmed to 200 chars instead of using max and strlen©

It will be sanitized to:

this-very-long-title-is-to-help-demonstrate-that-partial-encoded-values-remain-when-you-try-to-use-sanitize-title-with-dashes-on-encoded-strings-trimmed-to-200-chars-instead-of-using-max-and-strlenco

In your theme:

<?php
    if ( have_posts() ) : while( have_posts() ) : the_post();
        $slug = get_post_field( 'post_name', get_the_ID() );
        $sanitized_title = sanitize_title_with_dashes( get_the_title() );
        echo ( $slug === $sanitized_title ) ? 'Yes' : 'No';
    endwhile; endif;
?>

Prior to the patch, it should return 'Yes'. After the patch, the previously stored slug will still have, for example, co at the end, resulting in a 'No' result.


While this is a very specific edge case whereby there is an old post with a title that is over 200 characters long and just happens to have any sanitized substring that crosses the 200 character boundary, it's still a breaking change after fixing sanitize_title_with_dashes().

Throw into the mix that sanitize_title_with_dashes() can be, and is, used to sanitize other strings that may be stored in the database and used later in comparative operations, and this is a breaking change with some potential reach.

Last edited 3 years ago by costdev (previous) (diff)

#10 @costdev
2 years ago

@SergeyBiryukov is there anything I can do to progress this ticket further?

#11 @hellofromTonya
2 years ago

  • Milestone changed from 5.9 to 6.0

Absolutely, it is a breaking change, in certain circumstances.

Hmm, this change needs careful consideration for the breaking change and its impact. As tomorrow is 5.9 Beta 1, moving this to 6.0.

This ticket was mentioned in Slack in #core by costdev. View the logs.


2 years ago

#13 @costdev
23 months ago

  • Milestone changed from 6.0 to Future Release

As this ticket hasn't had any progress during this cycle, I'm moving this to Future Release.

Note: See TracTickets for help on using tickets.