Make WordPress Core

Opened 5 years ago

Last modified 5 years ago

#47941 new defect (bug)

URL with umlaut is sanitized in user-edit.php form

Reported by: smaffulli's profile smaffulli Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: Users Keywords: reporter-feedback
Focuses: Cc:

Description

I'm trying to add the LinkedIn URL to a user whose URL is https://www.linkedin.com/in/frank-rösner-83736/

When I copy that URL from the browser, the ö in the url becomes https://www.linkedin.com/in/frank-r%C3%B6sner-83736/

Pasting that in the LinkedIn url makes %C3%B6 disappear.

Change History (4)

#1 @SergeyBiryukov
5 years ago

  • Component changed from General to Users
  • Keywords close added

Hi @smaffulli, welcome to WordPress Trac! Thanks for the report.

As previously noted in #18945, encoding non-ASCII characters in URLs is a part of web standards:

Non-ASCII characters must first be encoded according to UTF-8 [STD63], and then each octet of the corresponding UTF-8 sequence must be percent-encoded to be represented as URI characters.

http://tools.ietf.org/html/rfc3986#page-21
http://en.wikipedia.org/wiki/Percent-encoding#Current_standard

It's the same for Cyrillic characters, for example. I don't see a bug here.

That said, most browsers decode the URLs to display them in a human-readable form.

#2 follow-up: @smaffulli
5 years ago

I may have not made myself clear: I know that UTF-8 is part of the standard. What I noticed is that when I copy from the browser into the form user-edit.php, the characters show up (correctly) as percent-encoded. When I save the form, the form shows https://www.linkedin.com/in/frank-rsner-83736/ which is obviously a 404. It looks to me like the form 'sanitizes' the encoded character. Try, I think you have all the information to replicate.

Last edited 5 years ago by smaffulli (previous) (diff)

#3 in reply to: ↑ 2 @SergeyBiryukov
5 years ago

  • Keywords reporter-feedback added; close removed

Replying to smaffulli:

What I noticed is that when I copy from the browser into the form user-edit.php, the characters show up (correctly) as percent-encoded. When I save the form, the form shows https://www.linkedin.com/in/frank-rsner-83736/ which is obviously a 404. It looks to me like the form 'sanitizes' the encoded character.

Thanks for clarifying, it makes more sense now :)

However, I could not reproduce the issue with the Website field from user profile. That LinkedIn URL is saved as expected for me, both the encoded and human-readable form.

Do you experience this issue with the default Website field, or some other custom field?

#4 @smaffulli
5 years ago

I tried to save it in the LinkedIn URL field on a multisite WP (if that makes a difference). Following your comment, I tried in the Website field and the error doesn't appear. The sanitization appears only on LinkedIn field.

Note: See TracTickets for help on using tickets.