Opened 5 years ago
Last modified 5 years ago
#47941 new defect (bug)
URL with umlaut is sanitized in user-edit.php form
Reported by: | smaffulli | Owned by: | |
---|---|---|---|
Milestone: | Awaiting Review | Priority: | normal |
Severity: | normal | Version: | |
Component: | Users | Keywords: | reporter-feedback |
Focuses: | Cc: |
Description
I'm trying to add the LinkedIn URL to a user whose URL is https://www.linkedin.com/in/frank-rösner-83736/
When I copy that URL from the browser, the ö in the url becomes https://www.linkedin.com/in/frank-r%C3%B6sner-83736/
Pasting that in the LinkedIn url makes %C3%B6
disappear.
Change History (4)
#2
follow-up:
↓ 3
@
5 years ago
I may have not made myself clear: I know that UTF-8 is part of the standard. What I noticed is that when I copy from the browser into the form user-edit.php, the characters show up (correctly) as percent-encoded. When I save the form, the form shows https://www.linkedin.com/in/frank-rsner-83736/
which is obviously a 404. It looks to me like the form 'sanitizes' the encoded character. Try, I think you have all the information to replicate.
#3
in reply to:
↑ 2
@
5 years ago
- Keywords reporter-feedback added; close removed
Replying to smaffulli:
What I noticed is that when I copy from the browser into the form user-edit.php, the characters show up (correctly) as percent-encoded. When I save the form, the form shows
https://www.linkedin.com/in/frank-rsner-83736/
which is obviously a 404. It looks to me like the form 'sanitizes' the encoded character.
Thanks for clarifying, it makes more sense now :)
However, I could not reproduce the issue with the Website field from user profile. That LinkedIn URL is saved as expected for me, both the encoded and human-readable form.
Do you experience this issue with the default Website field, or some other custom field?
Hi @smaffulli, welcome to WordPress Trac! Thanks for the report.
As previously noted in #18945, encoding non-ASCII characters in URLs is a part of web standards:
http://tools.ietf.org/html/rfc3986#page-21
http://en.wikipedia.org/wiki/Percent-encoding#Current_standard
It's the same for Cyrillic characters, for example. I don't see a bug here.
That said, most browsers decode the URLs to display them in a human-readable form.