Opened 3 years ago
Last modified 3 years ago
#53910 reviewing defect (bug)
`sanitize_title_with_dashes` returns partial encoded values in permalink
Reported by: | costdev | Owned by: | SergeyBiryukov |
---|---|---|---|
Milestone: | Future Release | Priority: | normal |
Severity: | major | Version: | 5.8 |
Component: | Permalinks | Keywords: | has-patch has-unit-tests dev-feedback |
Focuses: | ui, rtl, administration | Cc: |
Description (last modified by )
Picked up quite an old bug (circa 2006!) while working on #47912.
sanitize_title_with_dashes()
does a check to see if the title seems_utf8()
and subsequently url encodes it. The call to utf8_uri_encode()
has a $length
argument of 200
.
If an encoded value crosses the 200 boundary, the encoded value is cut and the remainder isn't picked up by any of the subsequent actions taken by sanitize_title_with_dashes()
.
this-very-long-title-is-to-help-demonstrate-that-partial-encoded-values-remain-when-you-try-to-use-sanitize-title-with-dashes-on-encoded-strings-trimmed-to-200-chars-instead-of-using-max-and-strlen%e2%80%af
becomes:
this-very-long-title-is-to-help-demonstrate-that-partial-encoded-values-remain-when-you-try-to-use-sanitize-title-with-dashes-on-encoded-strings-trimmed-to-200-chars-instead-of-using-max-and-strlen%e2
I've resolved this issue by:
- Storing the
seems_utf8()
value - Changing the call to
utf8_uri_encode()
, so that thelength
argument is themax()
ofstrlen( $title )
and200
- Trimming the
$title
to200
at the end of thesanitize_title_with_dashes
instead.
Attachments (1)
Change History (14)
#2
@
3 years ago
- Keywords needs-patch added
Thanks for opening this ticket, @costdev !
- Trimming the $title to 200 at the end of the sanitize_title_with_dashes instead.
Looks like the easier option here. With such a length, I don't see how it could hurt to cut off titles with more than 200 characters when performing the url sanitization.
This ticket was mentioned in PR #1569 on WordPress/wordpress-develop by costdev.
3 years ago
#3
- Keywords has-patch has-unit-tests added; needs-patch removed
sanitize_title_with_dashes()
does a check to see if the title seems_utf8()
and subsequently url encodes it.
The call to utf8_uri_encode()
has a $length
argument of 200
.
If an encoded value crosses the 200 boundary, the encoded value is cut and the remainder isn't picked up by any of the subsequent actions taken by sanitize_title_with_dashes()
.
Resolved by:
- storing the
seems_utf8()
value. - changing the call to
utf8_uri_encode()
, so that thelength
argument is themax()
ofstrlen( $title )
and200
. - trimming the
$title
to200
at the end ofsanitize_title_with_dashes()
instead.
Trac ticket: https://core.trac.wordpress.org/ticket/53910
#4
@
3 years ago
Thanks @audrasjb!
I've just updated the PR/patch to:
- remove the
$title_seems_utf8
variable - remove the
$length
argument in the original callutf8_uri_encode()
- trim any title that
seems_utf8()
to200
at the end ofsanitize_title_with_dashes()
PHPUnit tests are successful.
As the issue only occurred with seems_utf8()
titles that were trimmed too early, I've only made the change in this circumstance to prevent the fix interfering with unknown potential uses of 200+ character title slugs.
#5
@
3 years ago
- Milestone changed from Awaiting Review to 5.9
- Owner set to SergeyBiryukov
- Status changed from new to reviewing
#6
follow-up:
↓ 8
@
3 years ago
Thanks! The patch (with unit tests, thanks!) looks great to me.
I'll add a dev-feedback
workflow keyword, just to get another thoughts on the implementation (trim to 200 chars), but it looks like the safer option to me.
Aside, I feel a bit hesitant to move it to the next minor milestone. I don't think it could break anything, but I'm not sure it could not be considered as a breaking change since it changes the behavior of the URL sanitization…
#8
in reply to:
↑ 6
@
3 years ago
Replying to audrasjb:
Aside, I feel a bit hesitant to move it to the next minor milestone. I don't think it could break anything, but I'm not sure it could not be considered as a breaking change since it changes the behavior of the URL sanitization…
Absolutely, it is a breaking change, in certain circumstances. I've also now realised that this bug doesn't just occur with encoded substrings like %e2%80%93
, but with all sanitized substrings including ©
that exist on either side of the 200 character boundary.
Before applying the patch, create a post with this title:
This very long title is to help demonstrate that partial encoded values remain when you try to use sanitize title with dashes on encoded strings trimmed to 200 chars instead of using max and strlen©
It will be sanitized to:
this-very-long-title-is-to-help-demonstrate-that-partial-encoded-values-remain-when-you-try-to-use-sanitize-title-with-dashes-on-encoded-strings-trimmed-to-200-chars-instead-of-using-max-and-strlenco
In your theme:
<?php if ( have_posts() ) : while( have_posts() ) : the_post(); if ( get_post_field( 'post_name', get_the_ID() ) === sanitize_title_with_dashes( get_the_title() ) ) { echo 'Yes'; } else { echo 'No'; } endwhile; endif; ?>
Prior to the patch, it should return 'Yes'. After the patch, the previously stored slug will still have, for example, co
at the end, resulting in a 'No' result.
While this is a very specific edge case whereby there is an old post with a title that is over 200 characters long and just happens to have any sanitized substring that crosses the 200 character boundary, it's still a breaking change after fixing sanitize_title_with_dashes()
.
Throw into the mix that sanitize_title_with_dashes()
can be, and is, used to sanitize other strings that may be stored in the database and used later in comparative operations, and this is a breaking change with some potential reach.
#11
@
3 years ago
- Milestone changed from 5.9 to 6.0
Absolutely, it is a breaking change, in certain circumstances.
Hmm, this change needs careful consideration for the breaking change and its impact. As tomorrow is 5.9 Beta 1, moving this to 6.0.
Example of issue