#47912 closed defect (bug) (fixed)
Non-visible chars in title creates some encoded values in permalink
Reported by: | dhanendran | Owned by: | SergeyBiryukov |
---|---|---|---|
Milestone: | 5.9 | Priority: | normal |
Severity: | major | Version: | 5.2.2 |
Component: | Permalinks | Keywords: | has-patch has-unit-tests commit has-testing-info |
Focuses: | ui, rtl, administration | Cc: |
Description
When someone copy past some values from word documents, some other editor or some site on the web, we end-up copying some non visible chars and pasting it in the WordPress post title. This creates some encoded values in the permalink.
[In this link|https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128] there are so many non visible characters which can be easily added in to the post title. There are other examples as well for this issue one of them is [Byte Order Mark|https://en.wikipedia.org/wiki/Byte_order_mark].
sanitize_title() function only removes the accents from the title not these kind of characters.
I have searched all the open tickets in trac and couldn't find anything related to this issue, so opening one.
Attachments (2)
Change History (49)
#1
@
5 years ago
- Milestone changed from Awaiting Review to 5.3
- Owner set to SergeyBiryukov
- Status changed from new to accepted
#3
@
5 years ago
The characters are generally added on a case-by-case basis, not all at once.
Let's start with these two here:
- en quad,
%e2%80%80
(from the screenshot). - byte order mark,
%ef%bb%bf
.
#4
@
5 years ago
- Keywords has-patch has-unit-tests added; needs-patch needs-unit-tests removed
@sergeybiryukov I have added the patch with unit test.
This ticket was mentioned in Slack in #core by marybaum. View the logs.
5 years ago
#9
@
5 years ago
- Milestone changed from 5.4 to Future Release
Hi,
With 5.4 Beta 3 approaching and the Beta period reserved for bugs introduced during the cycle, this is being moved to Future Release. If any maintainer or committer feels this should be included or wishes to assume ownership during a specific cycle, feel free to update the milestone accordingly.
#12
@
4 years ago
@SergeyBiryukov You milestoned this for 5.5. Is it on your list before the beta, or do you want to punt?
#16
@
4 years ago
- Keywords needs-testing-info added
Scheduling this ticket for Testing Scrub, but need more information for the testers to QA it:
- What are the steps to reproduce the problem?
- Are there any testing dependencies such as a plugin or script?
- What is the expected behavior after applying the patch?
#17
@
4 years ago
@hellofromTonya
What are the steps to reproduce the problem?
"Non
visible char" you can copy-paste this text in the post title and can notice the permalink for some encoded chars or copy-paste some non-visible chars form https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128 in the title.
Are there any testing dependencies such as a plugin or script?
There are no dependencies for this.
What is the expected behavior after applying the patch?
When we paste these non-visible chars in the title, it should be removed and save only the valid chars.
#18
@
4 years ago
- Keywords needs-testing-info removed
Thank you @dhanendran. This ticket is ready for Test Scrub and manual testing.
This ticket was mentioned in Slack in #core by hellofromtonya. View the logs.
4 years ago
#20
@
4 years ago
I wanted to see what this would look like before testing it in localhost:8889. So I opened one of my local test sites. Running WP 5.6. Gutenberg 9.8. Using Twenty Twenty One. Pasted in "Non visible char" (copy and pasted what was mentioned above) as a post title. Noticed that the post title added 1 space extra but that the URL slug looks like this: non-visible-char
This ticket was mentioned in Slack in #core by monikarao. View the logs.
4 years ago
This ticket was mentioned in Slack in #core by monikarao. View the logs.
4 years ago
#23
@
4 years ago
Result before patch:
/non%e2%80%80visible-char-you-can-copy-paste-this-text/
Result with the patch applied:
/nonvisible-char-you-can-copy-paste-this-text/
Looks like there is a character there that needs to be replaced with a hyphen, not removed?
Please remove the duplicate '%e2%80%9f'
from the list.
#24
@
4 years ago
- Keywords needs-refresh added
Adding needs-refresh per the comment above.
For characters that display without a width, that is no apparent content in to the viewer, then the current block they've been added to (// Strip these characters entirely.
) is correct.
For characters that display with a width, that is apparently a space to the viewer, then the better approach would to be to replace them with a hyhpen as 2poena says.
This ticket was mentioned in Slack in #core by hellofromtonya. View the logs.
4 years ago
#26
@
4 years ago
- Keywords needs-patch added; has-patch has-unit-tests needs-refresh removed
- Milestone changed from 5.7 to 5.8
Changing to needs-patch
to reflect the changes that need to be made to the original patch.
With 5.7 RC1 landing tomorrow and no activity on this ticket for a new patch, punting this ticket to 5.8.
If any maintainer or committer feels this can be resolved in time, or wishes to assume ownership during a specific cycle, feel free to update the milestone accordingly.
This ticket was mentioned in PR #1329 on WordPress/wordpress-develop by costdev.
3 years ago
#28
- Keywords has-patch added; needs-patch removed
- Remove
non-visible characters that display without a width
. - Replace
non-visible characters that display with a width
with hyphen.
Fixes #47912.
Trac ticket: https://core.trac.wordpress.org/ticket/47912
#29
@
3 years ago
I've submitted a patch based on the above discussion which removes non-visible characters that display without a width
and replaces non-visible characters that display with a width
with a hyphen.
#31
@
3 years ago
- Milestone changed from 5.8 to 5.9
The provided patch still needs some testing.
Given we're about to release 5.8 beta 1, let's move it for 5.9 consideration.
#33
@
3 years ago
Thanks @costdev, the refresh and unit tests look good to me.
The PR still works fine, I can't see anything else needed but I can miss something.
#34
@
3 years ago
Thanks @costdev for refreshing the patch and adding tests.
I reviewed the tests and made improvement suggestions in the PR including:
- Moving the testing scenarios to data providers. Why? To ensure all of them run even if any of them fail
- Adding more happy and unhappy path scenarios for additional test coverage
Happy to collaborate in the PR.
3 years ago
#35
@hellofromtonya thanks! Am I correct that the suggestions have been tested and are ready for commiting to the PR?
hellofromtonya commented on PR #1329:
3 years ago
#36
Great question. @costdev Yes, they should be as I ran the tests and WPCS sniffer locally. Assuming copy/paste didn't mess up the formatting or I didn't fat fingered something, the suggestions should be ready for committing.
#39
@
3 years ago
- Keywords needs-testing removed
Testing Report
Env
- WordPress: 5.8 and trunk
- PHP: 7.4.1 and 8.0
- Localhost:
wp-env
and Local - OS: macOS Big Sur 11.5
- Browser: Chrome 92.0.4515.131 and Firefox 90.0.2
- Theme: Twenty Twenty-One
- Plugins: none activated
- Custom scripts: none
Steps to Test
- Add New Post
- Copy and paste
Non visible char
as the title - Publish post
- Notice the pretty permalink generated, e.g.
http://wpdevtest.local/non%e2%80%80visible-char/
- Apply the patch
- Repeat the above steps with title
Non visible char with patch
- Notice the pretty permalink generated which should strip out the nonvisible characters, e.g.
http://wpdevtest.local/non-visible-char-with-patch/
Test Results
- Reproduced the problem ✅
- The generated permalink includes the non-visible characters
- Result:
http://wpdevtest.local/non%e2%80%80visible-char/
- After applying patch, problem fixed ✅
- Non-visible characters removed from the permalink
- Result:
http://wpdevtest.local/non-visible-char-with-patch/
#40
@
3 years ago
- Keywords commit added
- Reviewed, expanded unit tests, and approved PR 1329 ✅
- Manually tested to confirm PR 1329 resolves the issue ✅
Marking for commit
consideration.
#41
@
3 years ago
- Keywords has-testing-info added
Marking as having testing info: the steps to reproduce and test the patch are found in Testing Report.
This ticket was mentioned in Slack in #core by jjj. View the logs.
3 years ago
hellofromtonya commented on PR #1329:
3 years ago
#44
Committed via https://core.trac.wordpress.org/changeset/51984.
Hi @dhanendran, welcome to WordPress Trac! Thanks for the report.
sanitize_title()
indeed does not remove this kind of characters, but sanitize_title_with_dashes() does.Looks like we need to add some more characters there.