WordPress.org

Make WordPress Core

Opened 5 years ago

Last modified 4 weeks ago

#10690 reopened defect (bug)

WordPress does not support non-ascii characters in URLs

Reported by: paddya Owned by: markjaquith
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 2.8.4
Component: Canonical Keywords: needs-patch needs-unit-tests
Focuses: Cc:

Description

WordPress' clean_url() strips out most characters, which are non-ascii for security reasons. This is causing problems, if you want to run a WordPress blog on a domain containing non-ascii-characters (e.g. müller.com), because WordPress generates wrong URLs on redirects.

Change History (37)

comment:1 paddya5 years ago

wp_sanitize_redirect() is affected by this issue, too.

comment:2 scribu5 years ago

  • Component changed from General to Canonical
  • Keywords needs-patch added; URLs removed
  • Owner set to markjaquith

comment:3 scribu5 years ago

  • Keywords changed from needs-patch, sanitizing to needs-patch sanitizing

comment:4 azaozz4 years ago

  • Milestone changed from 2.9 to Future Release

No patch.

comment:5 matt4 years ago

Part of the reason for being conservative in those functions when they were written is that we had security issues in the past with characters allowing for header splitting when included with redirects, and probably an assumption that all i18n stuff would be encoded and compatible with our current methods. Any fix for this should look at all the security implications.

comment:6 nbachiyski4 years ago

  • Status changed from new to reviewing

The problem is not in clean_url(), which works perfectly fine with unicode URLs (see TestCleanUrl::test_not_ascii() in http://svn.automattic.com/wordpress-tests/wp-testcase/test_includes_formatting.php).

The problem is in wp_sanitize_redirect(), which uses a lot more restrictive filter than clean_url().

@markjaquith is there a reason wp_sanitize_redirect() to be different from clean_url()?

comment:7 dwright4 years ago

  • Cc david_v_wright@… added

duplicate of http://core.trac.wordpress.org/ticket/10298
I posted a possible solution there. (patch) (it works but is not 'pretty')

comment:8 follow-up: hakre4 years ago

  • Keywords needs-patch sanitizing removed

URLs per definition can only conain ASCII chars. If clean_url() removes non ascii (7bit I hope) chars, that's fine. Please use the PUNYCODE notation of your domain instead of the UTF8-representation.

I suggest to close as invalid.

comment:9 in reply to: ↑ 8 dwright4 years ago

Replying to hakre:

URLs per definition can only conain ASCII chars. If clean_url() removes non ascii (7bit I hope) chars, that's fine. Please use the PUNYCODE notation of your domain instead of the UTF8-representation.

I suggest to close as invalid.

I believe that Wordpress has enough international usage to warrant proper IDN support.
It seems sub-optimal for Mr/Mrs. Müller to have to send users to their website: xn--mller-kva.com, rather than müller.com

An IDNA-enabled application is able to convert between the internationalised and ASCII representations of a domain name. It uses the ASCII form for DNS lookups but can present the internationalised form to users who presumably prefer to read and write domain names in non-ASCII scripts such as Arabic or Hiragana. Applications that do not support IDNA will not be able to handle domain names with non-ASCII characters, but will still be able to access such domains if given the (usually rather cryptic) ASCII equivalent. - http://en.wikipedia.org/wiki/Internationalized_domain_name

comment:10 scribu4 years ago

  • Keywords needs-patch added

xref: #10298

comment:11 scribu4 years ago

+1 on IDNA support.

comment:12 hakre4 years ago

Just some pseudocode and thoughts:

$url_unknown = getoption('siteurl');
list($prefix, $domain, $suffix) = extract_domain_from_url($url);
$domain_ascii = ( seems_utf8( $domain ) ?  utf8_to_punycode($domain) : $domain ) 
$url_ascii =  $prefix . $domain_ascii . $suffix;

It might be an option to hook the option siteurl when read but that might break the admin output. the rest should work then.

Or: once entered, the domain will automatically converted and the users do not need to do it on their own (Safest way I suppose).

comment:13 nbachiyski4 years ago

Most of the time we can use the UTF-8 representation of both the domain and the path. Only the headers impose an ASCII restriction, so we can run the domain through the punycode converter and urlencode the path.

Have you found other places, where the UTF-8 representation causes problems?

comment:14 hakre4 years ago

There is a problem where the punycode representation causes a problem, I tagged it as related because it somehow is: #11734

For the rest of your comment: I think that's a good way to go. We assume that the domain names are UTF-8 encoded. For URLs, as long as we do not read them URL encoded they are not punycode-encoded as well. Should pyncode-encoding be traded as URLencoded? URLs can be properly split anyway, so this can be decently applied. maybe a helper function that does the job (add a paramter for the from / to type).

comment:15 sirzooro4 years ago

  • Cc sirzooro added

comment:16 dwright4 years ago

  • Resolution set to worksforme
  • Status changed from reviewing to closed

I am closing this ticket as works-for-me as there is an now IDNA plugin which enables one to use IDN's (domain name's that contains non-ascii characters) in WordPress http://wordpress.org/extend/plugins/idna/

comment:17 scribu4 years ago

  • Milestone Future Release deleted

comment:18 follow-ups: codestyling3 years ago

  • Resolution worksforme deleted
  • Status changed from closed to reopened
  • Version changed from 2.8.4 to 3.1

IDN handling is different related to Browsers! WebKit based browser like Safari and Chrome work with PunyCode URL's but others like IE, Firefox and Opera doesn't.
This is a problem of Cross Site Scripting detection and can be realize and tested, if the Blog is configured to an PunyCode Url.

example out of a case I did investigate:
IDN: http://с-проект.рф
PunyCode: http://xn----jtbpoegeo.xn--p1ai

If you try to call a JSON request like this example with the generated admin_url() out of WordPress, which would become the PunyCode one:

	new Ajax.Request('http://xn----jtbpoegeo.xn--p1ai/wp-admin/admin-ajax.php' ?>', 
		{  
			parameters: {
				action: 'get_download_section'
			},
			onSuccess: function(transport) {		
				elem.title=transport.responseJSON.title;
			},
			onFailure: function(transport) {
				alert('JSON security bug')
			}
		}
	);

and the answer is correct 'application/json' with correct JSON content, than this fails on all browsers except WebKit based!
If you try it with the original IDN Url like:

	new Ajax.Request('http://с-проект.рф/wp-admin/admin-ajax.php' ?>', 

it works now for all other browsers but fails now on WebKit based.

My suggestion will be a conditional convertion back to IDN, if browser is not WebKit based.
I did this inside my WordPress plugin "Codestyling Localization" and it works now in any case. I did use the class idna_convert from Matthias Sommerfeld for easy decode of PunyCode admin url's in such a case.

Please check it also in relation to #11734 / #10690 / #14648 because this may also affect the flash uploader feeded with PunyCode url's instead of IDN for some browser!

comment:19 ocean903 years ago

  • Milestone set to Awaiting Review
  • Version changed from 3.1 to 2.8.4

comment:20 in reply to: ↑ 18 ; follow-up: hakre3 years ago

Replying to codestyling:

Can you specify Ajax.Request? Is that of some javascript library?

comment:21 toscho3 years ago

  • Cc info@… added

comment:22 in reply to: ↑ 20 ; follow-up: codestyling3 years ago

Replying to hakre:

Replying to codestyling:

Can you specify Ajax.Request? Is that of some javascript library?

Yes I can. This is belongs to Prototype.js library shipped with WordPress as one of the standard libraries. Documentation can be found here: http://api.prototypejs.org/ajax/Ajax/Request/

comment:23 codestyling3 years ago

  • Cc codestyling added

comment:24 in reply to: ↑ 22 ; follow-up: hakre3 years ago

Replying to codestyling:

Replying to hakre:

Replying to codestyling:

Can you specify Ajax.Request? Is that of some javascript library?

Yes I can. This is belongs to Prototype.js library shipped with WordPress as one of the standard libraries. Documentation can be found here: http://api.prototypejs.org/ajax/Ajax/Request/

Thanks for providing the information. Can you provide a link to the prototype documentation of that Ajax.Request that is specifying the encoding of the URL parameter? The link you provided does not say in which encoding/format the URL has to be provided. This might just be a bug in prototype, not Wordpress.

comment:25 in reply to: ↑ 24 ; follow-up: codestyling3 years ago

Replying to hakre:

Replying to codestyling:

Replying to hakre:

Replying to codestyling:

Can you specify Ajax.Request? Is that of some javascript library?

Yes I can. This is belongs to Prototype.js library shipped with WordPress as one of the standard libraries. Documentation can be found here: http://api.prototypejs.org/ajax/Ajax/Request/

Thanks for providing the information. Can you provide a link to the prototype documentation of that Ajax.Request that is specifying the encoding of the URL parameter? The link you provided does not say in which encoding/format the URL has to be provided. This might just be a bug in prototype, not Wordpress.

Here ist the default option description of Prototype.js Ajax classes: http://api.prototypejs.org/ajax/

Please look at the evalJS property set as default to true which results into automatic population of responseJSON instead of responseText if the returned content type is application/json.
As also noticed, this is strongly related to same origin policy and this will not work at all browsers the same way unsing IDN based URL's like the admin_url() do at WordPress for PunyCode URL's.

Even if this could be a bug at Prototype.js it affects function of WordPress plugins/themes using the onboard shipped Prototype.js functionality to solve the goals.

comment:26 in reply to: ↑ 25 hakre3 years ago

Replying to codestyling:

Replying to hakre:

Replying to codestyling:

Replying to hakre:

Replying to codestyling:

Can you specify Ajax.Request? Is that of some javascript library?

Yes I can. This is belongs to Prototype.js library shipped with WordPress as one of the standard libraries. Documentation can be found here: http://api.prototypejs.org/ajax/Ajax/Request/

Thanks for providing the information. Can you provide a link to the prototype documentation of that Ajax.Request that is specifying the encoding of the URL parameter? The link you provided does not say in which encoding/format the URL has to be provided. This might just be a bug in prototype, not Wordpress.

Here ist the default option description of Prototype.js Ajax classes: http://api.prototypejs.org/ajax/

The link you provided says nothing about which encoding/format is expected/supported by that function for which browser. Until this information is not provided I can not say here exactly the cause of the problem is. Normally I only fix problems if I know their cause, so that's why I was asking. Until provided I will step back from the discussion a bit.

Even if this could be a bug at Prototype.js it affects function of WordPress plugins/themes using the onboard shipped Prototype.js functionality to solve the goals.

That's the case for any library used: If the library has a bug, that bug might get transparently transported into the application making use of the library.

If you see a problem in using Prototype.js Ajax across different browsers, please report the problem back to the prototype library project so they can fix it and every user of the library can benefit from the fix. Thanks.

Last edited 3 years ago by hakre (previous) (diff)

comment:27 in reply to: ↑ 18 SergeyBiryukov2 years ago

Replying to codestyling:

IDN handling is different related to Browsers! WebKit based browser like Safari and Chrome work with PunyCode URL's but others like IE, Firefox and Opera doesn't.
This is a problem of Cross Site Scripting detection and can be realize and tested, if the Blog is configured to an PunyCode Url.

Related: #18952

comment:28 SergeyBiryukov2 years ago

  • Summary changed from Wordpress does not support non-ascii characters in URLs to WordPress does not support non-ascii characters in URLs

comment:29 SergeyBiryukov2 years ago

Closed #19279 as a duplicate.

comment:30 espsjurs15 months ago

Hello there,

This is still an issue :-(

Is someone working on this?

Espen

comment:31 SergeyBiryukov15 months ago

Now that AJAX for IDN is fixed in #18952, I don't see any other issues other than #11734 when using the Punycode representation of URLs.

We should probably convert home and siteurl to Punycode automatically, as mentioned in #19279.

Version 0, edited 15 months ago by SergeyBiryukov (next)

comment:32 espsjurs14 months ago

If you intend to use multisite you can forget special characters in the domain name. It will work as long as you use Punycode (the translated version illegible) of the domain name, but all the links generated from the WP will be 'puny'. Example url in registration mail and site generation page..

comment:33 espsjurs14 months ago

  • Cc espsjurs added

comment:34 mdawaffe9 months ago

New patch attached to #11734. We still need a patch here to convert home and siteurl to punycode/ascii.

comment:35 joostdevalk4 weeks ago

The issue with sanitize_redirect is actually rather annoying as well, so it's not just home and siteurl...

comment:36 nacin4 weeks ago

  • Keywords needs-unit-tests added
  • Summary changed from WordPress does not support non-ascii characters in URLs to WordPress does not support non-ascii characters in URLs

comment:37 szepe.viktor4 weeks ago

This small wanna-be-plugin works on production sites:
https://gist.github.com/szepeviktor/9682764
(you shouldn't name your menus with non-ascii characters, because they will go into a CSS class)

Last edited 4 weeks ago by szepe.viktor (previous) (diff)
Note: See TracTickets for help on using tickets.