WordPress.org

Make WordPress Core

Opened 2 years ago

Closed 5 weeks ago

#47912 closed defect (bug) (fixed)

Non-visible chars in title creates some encoded values in permalink

Reported by: dhanendran Owned by: SergeyBiryukov
Milestone: 5.9 Priority: normal
Severity: major Version: 5.2.2
Component: Permalinks Keywords: has-patch has-unit-tests commit has-testing-info
Focuses: ui, rtl, administration Cc:

Description

When someone copy past some values from word documents, some other editor or some site on the web, we end-up copying some non visible chars and pasting it in the WordPress post title. This creates some encoded values in the permalink.
https://dhanendranrajagopal.me/wp-content/uploads/2019/08/non-visible-chars-in-title.png

[In this link|https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128] there are so many non visible characters which can be easily added in to the post title. There are other examples as well for this issue one of them is [Byte Order Mark|https://en.wikipedia.org/wiki/Byte_order_mark].

sanitize_title() function only removes the accents from the title not these kind of characters.

I have searched all the open tickets in trac and couldn't find anything related to this issue, so opening one.

Attachments (2)

47912.diff (1.4 KB) - added by dhanendran 2 years ago.
added patch and unit test
47912-test-pr1329.gif (7.2 MB) - added by hellofromTonya 4 months ago.
Test Report: reproduced issue; applied PR 1329 => fixes issue

Change History (45)

#1 @SergeyBiryukov
2 years ago

  • Milestone changed from Awaiting Review to 5.3
  • Owner set to SergeyBiryukov
  • Status changed from new to accepted

Hi @dhanendran, welcome to WordPress Trac! Thanks for the report.

sanitize_title() indeed does not remove this kind of characters, but sanitize_title_with_dashes() does.

Looks like we need to add some more characters there.

#2 @SergeyBiryukov
2 years ago

  • Keywords needs-patch needs-unit-tests added

#3 @SergeyBiryukov
2 years ago

The characters are generally added on a case-by-case basis, not all at once.

Let's start with these two here:

@dhanendran
2 years ago

added patch and unit test

#4 @dhanendran
2 years ago

  • Keywords has-patch has-unit-tests added; needs-patch needs-unit-tests removed

@sergeybiryukov I have added the patch with unit test.

Last edited 2 years ago by dhanendran (previous) (diff)

This ticket was mentioned in Slack in #core by marybaum. View the logs.


2 years ago

#7 @garrett-eclipse
2 years ago

  • Keywords needs-testing added

@SergeyBiryukov will you have time to review for 5.3?

#8 @SergeyBiryukov
2 years ago

  • Milestone changed from 5.3 to 5.4

#9 @audrasjb
22 months ago

  • Milestone changed from 5.4 to Future Release

Hi,

With 5.4 Beta 3 approaching and the Beta period reserved for bugs introduced during the cycle, this is being moved to Future Release. If any maintainer or committer feels this should be included or wishes to assume ownership during a specific cycle, feel free to update the milestone accordingly.

#11 @SergeyBiryukov
21 months ago

  • Milestone changed from Future Release to 5.5

#12 @earnjam
17 months ago

@SergeyBiryukov You milestoned this for 5.5. Is it on your list before the beta, or do you want to punt?

#13 @SergeyBiryukov
17 months ago

  • Milestone changed from 5.5 to 5.6

#15 @helen
13 months ago

  • Milestone changed from 5.6 to 5.7

Moving to 5.7 to go with #50924 and #47594.

#16 @hellofromTonya
11 months ago

  • Keywords needs-testing-info added

Scheduling this ticket for Testing Scrub, but need more information for the testers to QA it:

  • What are the steps to reproduce the problem?
  • Are there any testing dependencies such as a plugin or script?
  • What is the expected behavior after applying the patch?

#17 @dhanendran
11 months ago

@hellofromTonya

What are the steps to reproduce the problem?

"Non visible char" you can copy-paste this text in the post title and can notice the permalink for some encoded chars or copy-paste some non-visible chars form https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128 in the title.

Are there any testing dependencies such as a plugin or script?

There are no dependencies for this.

What is the expected behavior after applying the patch?

When we paste these non-visible chars in the title, it should be removed and save only the valid chars.

#18 @hellofromTonya
11 months ago

  • Keywords needs-testing-info removed

Thank you @dhanendran. This ticket is ready for Test Scrub and manual testing.

This ticket was mentioned in Slack in #core by hellofromtonya. View the logs.


11 months ago

#20 @paaljoachim
11 months ago

I wanted to see what this would look like before testing it in localhost:8889. So I opened one of my local test sites. Running WP 5.6. Gutenberg 9.8. Using Twenty Twenty One. Pasted in "Non visible char" (copy and pasted what was mentioned above) as a post title. Noticed that the post title added 1 space extra but that the URL slug looks like this: non-visible-char

This ticket was mentioned in Slack in #core by monikarao. View the logs.


10 months ago

This ticket was mentioned in Slack in #core by monikarao. View the logs.


10 months ago

#23 @poena
10 months ago

Result before patch:
/non%e2%80%80visible-char-you-can-copy-paste-this-text/

Result with the patch applied:
/nonvisible-char-you-can-copy-paste-this-text/

Looks like there is a character there that needs to be replaced with a hyphen, not removed?

Please remove the duplicate '%e2%80%9f' from the list.

#24 @peterwilsoncc
10 months ago

  • Keywords needs-refresh added

Adding needs-refresh per the comment above.

For characters that display without a width, that is no apparent content in to the viewer, then the current block they've been added to (// Strip these characters entirely.) is correct.

For characters that display with a width, that is apparently a space to the viewer, then the better approach would to be to replace them with a hyhpen as 2poena says.

This ticket was mentioned in Slack in #core by hellofromtonya. View the logs.


10 months ago

#26 @hellofromTonya
10 months ago

  • Keywords needs-patch added; has-patch has-unit-tests needs-refresh removed
  • Milestone changed from 5.7 to 5.8

Changing to needs-patch to reflect the changes that need to be made to the original patch.

With 5.7 RC1 landing tomorrow and no activity on this ticket for a new patch, punting this ticket to 5.8.

If any maintainer or committer feels this can be resolved in time, or wishes to assume ownership during a specific cycle, feel free to update the milestone accordingly.

#27 @Boniu91
8 months ago

  • Keywords needs-testing removed

This ticket was mentioned in PR #1329 on WordPress/wordpress-develop by costdev.


6 months ago

  • Keywords has-patch added; needs-patch removed
  • Remove non-visible characters that display without a width.
  • Replace non-visible characters that display with a width with hyphen.

Fixes #47912.

Trac ticket: https://core.trac.wordpress.org/ticket/47912

#29 @costdev
6 months ago

I've submitted a patch based on the above discussion which removes non-visible characters that display without a width and replaces non-visible characters that display with a width with a hyphen.

#30 @JeffPaul
6 months ago

  • Keywords needs-testing added

#31 @audrasjb
6 months ago

  • Milestone changed from 5.8 to 5.9

The provided patch still needs some testing.
Given we're about to release 5.8 beta 1, let's move it for 5.9 consideration.

#32 @costdev
4 months ago

  • Keywords has-unit-tests added

I've just updated the PR/patch with PHPUnit tests.

#33 @audrasjb
4 months ago

Thanks @costdev, the refresh and unit tests look good to me.
The PR still works fine, I can't see anything else needed but I can miss something.

#34 @hellofromTonya
4 months ago

Thanks @costdev for refreshing the patch and adding tests.

I reviewed the tests and made improvement suggestions in the PR including:

  • Moving the testing scenarios to data providers. Why? To ensure all of them run even if any of them fail
  • Adding more happy and unhappy path scenarios for additional test coverage

Happy to collaborate in the PR.

#35 @prbot
4 months ago

costdev commented on PR #1329:

@hellofromtonya thanks! Am I correct that the suggestions have been tested and are ready for commiting to the PR?

#36 @prbot
4 months ago

hellofromtonya commented on PR #1329:

Great question. @costdev Yes, they should be as I ran the tests and WPCS sniffer locally. Assuming copy/paste didn't mess up the formatting or I didn't fat fingered something, the suggestions should be ready for committing.

#37 @prbot
4 months ago

costdev commented on PR #1329:

@hellofromtonya Great!

#38 @prbot
4 months ago

costdev commented on PR #1329:

Good work @costdev!

Thanks @hellofromtonya, you too !

@hellofromTonya
4 months ago

Test Report: reproduced issue; applied PR 1329 => fixes issue

#39 @hellofromTonya
4 months ago

  • Keywords needs-testing removed

Testing Report

Env

  • WordPress: 5.8 and trunk
  • PHP: 7.4.1 and 8.0
  • Localhost: wp-env and Local
  • OS: macOS Big Sur 11.5
  • Browser: Chrome 92.0.4515.131 and Firefox 90.0.2
  • Theme: Twenty Twenty-One
  • Plugins: none activated
  • Custom scripts: none

Steps to Test

  • Add New Post
  • Copy and paste Non visible char as the title
  • Publish post
  • Notice the pretty permalink generated, e.g. http://wpdevtest.local/non%e2%80%80visible-char/
  • Apply the patch
  • Repeat the above steps with title Non visible char with patch
  • Notice the pretty permalink generated which should strip out the nonvisible characters, e.g. http://wpdevtest.local/non-visible-char-with-patch/

Test Results

captured in the attached gif

  • Reproduced the problem ✅
    • The generated permalink includes the non-visible characters
    • Result: http://wpdevtest.local/non%e2%80%80visible-char/
  • After applying patch, problem fixed ✅
    • Non-visible characters removed from the permalink
    • Result: http://wpdevtest.local/non-visible-char-with-patch/

#40 @hellofromTonya
4 months ago

  • Keywords commit added

Marking for commit consideration.

#41 @hellofromTonya
4 months ago

  • Keywords has-testing-info added

Marking as having testing info: the steps to reproduce and test the patch are found in Testing Report.

This ticket was mentioned in Slack in #core by jjj. View the logs.


5 weeks ago

#43 @johnjamesjacoby
5 weeks ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 51984:

Permalinks: Sanitize non-visible characters inside sanitize_title_with_dashes().

This change prevents non-visible characters in titles from creating encoded values in permalinks, opting instead for the following replacement strategy:

  • Non-visible non-zero-width characters are replaced with hyphens
  • Non-visible zero-width characters are removed entirely

Included with this change are 64 additional PHPUnit assertions to confirm that only the targeted non-visible characters are sanitized as intended.

Before this change, URLs would unintentionally contain encoded values where these non-visible characters were. After this change, URLs intentionally strip out or hyphenate these non-visible characters.

Props costdev, dhanendran, hellofromtonya, paaljoachim, peterwilsoncc, poena, sergeybiryukov.

Fixes #47912.

Note: See TracTickets for help on using tickets.