Opened 15 years ago
Last modified 5 years ago
#10690 reopened defect (bug)
WordPress does not support non-ascii characters in the domain name
Reported by: | paddya | Owned by: | markjaquith |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | normal | Version: | 2.8.4 |
Component: | Canonical | Keywords: | needs-unit-tests needs-patch |
Focuses: | Cc: |
Description
WordPress' clean_url() strips out most characters, which are non-ascii for security reasons. This is causing problems, if you want to run a WordPress blog on a domain containing non-ascii-characters (e.g. müller.com), because WordPress generates wrong URLs on redirects.
Attachments (1)
Change History (50)
#2
@
15 years ago
- Component changed from General to Canonical
- Keywords needs-patch added; URLs removed
- Owner set to markjaquith
#5
@
15 years ago
Part of the reason for being conservative in those functions when they were written is that we had security issues in the past with characters allowing for header splitting when included with redirects, and probably an assumption that all i18n stuff would be encoded and compatible with our current methods. Any fix for this should look at all the security implications.
#6
@
15 years ago
- Status changed from new to reviewing
The problem is not in clean_url()
, which works perfectly fine with unicode URLs (see TestCleanUrl::test_not_ascii() in http://svn.automattic.com/wordpress-tests/wp-testcase/test_includes_formatting.php).
The problem is in wp_sanitize_redirect()
, which uses a lot more restrictive filter than clean_url()
.
@markjaquith is there a reason wp_sanitize_redirect()
to be different from clean_url()
?
#7
@
15 years ago
- Cc david_v_wright@… added
duplicate of http://core.trac.wordpress.org/ticket/10298
I posted a possible solution there. (patch) (it works but is not 'pretty')
#8
follow-up:
↓ 9
@
15 years ago
- Keywords needs-patch sanitizing removed
URLs per definition can only conain ASCII chars. If clean_url() removes non ascii (7bit I hope) chars, that's fine. Please use the PUNYCODE notation of your domain instead of the UTF8-representation.
I suggest to close as invalid.
#9
in reply to:
↑ 8
@
15 years ago
Replying to hakre:
URLs per definition can only conain ASCII chars. If clean_url() removes non ascii (7bit I hope) chars, that's fine. Please use the PUNYCODE notation of your domain instead of the UTF8-representation.
I suggest to close as invalid.
I believe that Wordpress has enough international usage to warrant proper IDN support.
It seems sub-optimal for Mr/Mrs. Müller to have to send users to their website: xn--mller-kva.com, rather than müller.com
An IDNA-enabled application is able to convert between the internationalised and ASCII representations of a domain name. It uses the ASCII form for DNS lookups but can present the internationalised form to users who presumably prefer to read and write domain names in non-ASCII scripts such as Arabic or Hiragana. Applications that do not support IDNA will not be able to handle domain names with non-ASCII characters, but will still be able to access such domains if given the (usually rather cryptic) ASCII equivalent. - http://en.wikipedia.org/wiki/Internationalized_domain_name
#12
@
15 years ago
Just some pseudocode and thoughts:
$url_unknown = getoption('siteurl'); list($prefix, $domain, $suffix) = extract_domain_from_url($url); $domain_ascii = ( seems_utf8( $domain ) ? utf8_to_punycode($domain) : $domain ) $url_ascii = $prefix . $domain_ascii . $suffix;
It might be an option to hook the option siteurl when read but that might break the admin output. the rest should work then.
Or: once entered, the domain will automatically converted and the users do not need to do it on their own (Safest way I suppose).
#13
@
15 years ago
Most of the time we can use the UTF-8 representation of both the domain and the path. Only the headers impose an ASCII restriction, so we can run the domain through the punycode converter and urlencode the path.
Have you found other places, where the UTF-8 representation causes problems?
#14
@
15 years ago
There is a problem where the punycode representation causes a problem, I tagged it as related because it somehow is: #11734
For the rest of your comment: I think that's a good way to go. We assume that the domain names are UTF-8 encoded. For URLs, as long as we do not read them URL encoded they are not punycode-encoded as well. Should pyncode-encoding be traded as URLencoded? URLs can be properly split anyway, so this can be decently applied. maybe a helper function that does the job (add a paramter for the from / to type).
#16
@
15 years ago
- Resolution set to worksforme
- Status changed from reviewing to closed
I am closing this ticket as works-for-me as there is an now IDNA plugin which enables one to use IDN's (domain name's that contains non-ascii characters) in WordPress http://wordpress.org/extend/plugins/idna/
#18
follow-ups:
↓ 20
↓ 27
@
14 years ago
- Resolution worksforme deleted
- Status changed from closed to reopened
- Version changed from 2.8.4 to 3.1
IDN handling is different related to Browsers! WebKit based browser like Safari and Chrome work with PunyCode URL's but others like IE, Firefox and Opera doesn't.
This is a problem of Cross Site Scripting detection and can be realize and tested, if the Blog is configured to an PunyCode Url.
example out of a case I did investigate:
IDN: http://с-проект.рф
PunyCode: http://xn----jtbpoegeo.xn--p1ai
If you try to call a JSON request like this example with the generated admin_url() out of WordPress, which would become the PunyCode one:
new Ajax.Request('http://xn----jtbpoegeo.xn--p1ai/wp-admin/admin-ajax.php' ?>', { parameters: { action: 'get_download_section' }, onSuccess: function(transport) { elem.title=transport.responseJSON.title; }, onFailure: function(transport) { alert('JSON security bug') } } );
and the answer is correct 'application/json' with correct JSON content, than this fails on all browsers except WebKit based!
If you try it with the original IDN Url like:
new Ajax.Request('http://с-проект.рф/wp-admin/admin-ajax.php' ?>',
it works now for all other browsers but fails now on WebKit based.
My suggestion will be a conditional convertion back to IDN, if browser is not WebKit based.
I did this inside my WordPress plugin "Codestyling Localization" and it works now in any case. I did use the class idna_convert from Matthias Sommerfeld for easy decode of PunyCode admin url's in such a case.
Please check it also in relation to #11734 / #10690 / #14648 because this may also affect the flash uploader feeded with PunyCode url's instead of IDN for some browser!
#20
in reply to:
↑ 18
;
follow-up:
↓ 22
@
14 years ago
Replying to codestyling:
Can you specify Ajax.Request
? Is that of some javascript library?
#22
in reply to:
↑ 20
;
follow-up:
↓ 24
@
14 years ago
Replying to hakre:
Replying to codestyling:
Can you specify
Ajax.Request
? Is that of some javascript library?
Yes I can. This is belongs to Prototype.js library shipped with WordPress as one of the standard libraries. Documentation can be found here: http://api.prototypejs.org/ajax/Ajax/Request/
#24
in reply to:
↑ 22
;
follow-up:
↓ 25
@
14 years ago
Replying to codestyling:
Replying to hakre:
Replying to codestyling:
Can you specify
Ajax.Request
? Is that of some javascript library?
Yes I can. This is belongs to Prototype.js library shipped with WordPress as one of the standard libraries. Documentation can be found here: http://api.prototypejs.org/ajax/Ajax/Request/
Thanks for providing the information. Can you provide a link to the prototype documentation of that Ajax.Request
that is specifying the encoding of the URL parameter? The link you provided does not say in which encoding/format the URL has to be provided. This might just be a bug in prototype, not Wordpress.
#25
in reply to:
↑ 24
;
follow-up:
↓ 26
@
14 years ago
Replying to hakre:
Replying to codestyling:
Replying to hakre:
Replying to codestyling:
Can you specify
Ajax.Request
? Is that of some javascript library?
Yes I can. This is belongs to Prototype.js library shipped with WordPress as one of the standard libraries. Documentation can be found here: http://api.prototypejs.org/ajax/Ajax/Request/
Thanks for providing the information. Can you provide a link to the prototype documentation of that
Ajax.Request
that is specifying the encoding of the URL parameter? The link you provided does not say in which encoding/format the URL has to be provided. This might just be a bug in prototype, not Wordpress.
Here ist the default option description of Prototype.js Ajax classes: http://api.prototypejs.org/ajax/
Please look at the evalJS property set as default to true which results into automatic population of responseJSON instead of responseText if the returned content type is application/json.
As also noticed, this is strongly related to same origin policy and this will not work at all browsers the same way unsing IDN based URL's like the admin_url() do at WordPress for PunyCode URL's.
Even if this could be a bug at Prototype.js it affects function of WordPress plugins/themes using the onboard shipped Prototype.js functionality to solve the goals.
#26
in reply to:
↑ 25
@
14 years ago
Replying to codestyling:
Replying to hakre:
Replying to codestyling:
Replying to hakre:
Replying to codestyling:
Can you specify
Ajax.Request
? Is that of some javascript library?
Yes I can. This is belongs to Prototype.js library shipped with WordPress as one of the standard libraries. Documentation can be found here: http://api.prototypejs.org/ajax/Ajax/Request/
Thanks for providing the information. Can you provide a link to the prototype documentation of that
Ajax.Request
that is specifying the encoding of the URL parameter? The link you provided does not say in which encoding/format the URL has to be provided. This might just be a bug in prototype, not Wordpress.
Here ist the default option description of Prototype.js Ajax classes: http://api.prototypejs.org/ajax/
The link you provided says nothing about which encoding/format is expected/supported by that function for which browser. Until this information is not provided I can not say here exactly the cause of the problem is. Normally I only fix problems if I know their cause, so that's why I was asking. Until provided I will step back from the discussion a bit.
Even if this could be a bug at Prototype.js it affects function of WordPress plugins/themes using the onboard shipped Prototype.js functionality to solve the goals.
That's the case for any library used: If the library has a bug, that bug might get transparently transported into the application making use of the library.
If you see a problem in using Prototype.js Ajax across different browsers, please report the problem back to the prototype library project so they can fix it and every user of the library can benefit from the fix. Thanks.
#27
in reply to:
↑ 18
@
13 years ago
Replying to codestyling:
IDN handling is different related to Browsers! WebKit based browser like Safari and Chrome work with PunyCode URL's but others like IE, Firefox and Opera doesn't.
This is a problem of Cross Site Scripting detection and can be realize and tested, if the Blog is configured to an PunyCode Url.
Related: #18952
#28
@
13 years ago
- Summary changed from Wordpress does not support non-ascii characters in URLs to WordPress does not support non-ascii characters in URLs
#31
@
12 years ago
#32
@
12 years ago
If you intend to use multisite you can forget special characters in the domain name. It will work as long as you use Punycode (the translated version illegible) of the domain name, but all the links generated from the WP will be 'puny'. Example url in registration mail and site generation page..
#34
@
11 years ago
New patch attached to #11734. We still need a patch here to convert home and siteurl to punycode/ascii.
#35
@
11 years ago
The issue with sanitize_redirect is actually rather annoying as well, so it's not just home and siteurl...
#36
@
11 years ago
- Keywords needs-unit-tests added
- Summary changed from WordPress does not support non-ascii characters in URLs to WordPress does not support non-ascii characters in URLs
#37
@
11 years ago
This small wanna-be-plugin works on production sites:
https://gist.github.com/szepeviktor/9682764
(you shouldn't name your menus with non-ascii characters, because they will go into a CSS class)
#40
follow-up:
↓ 42
@
10 years ago
@extendwings As I see it is an IDN domain name patch.
My mu-plugin handles non-ascii characters in the post slug (the URL path)
https://github.com/szepeviktor/wordpress-plugin-construction/blob/master/mu-latin-accent-urls/latin-accent-urls.php
#42
in reply to:
↑ 40
@
10 years ago
Replying to szepe.viktor:
@extendwings As I see it is an IDN domain name patch.
Yes, and this ticket is about IDN domains (see the description).
My mu-plugin handles non-ascii characters in the post slug (the URL path)
Looks like your plugin just disables half of remove_accents()
for slugs. There are probably use cases for that, but the function is there for a reason, so I don't think we want to do that in core. It also doesn't appear to resolve the wp_sanitize_redirect()
issue mentioned here.
#43
@
10 years ago
Thank you.
This is the use case:
http://esküvői-videók.hu/esk%C3%BCv%C5%91i-id%C3%A9zetek-7-r%C3%A9sz/
IDN domain name + non-ascii URL path
#44
@
10 years ago
- Summary changed from WordPress does not support non-ascii characters in URLs to WordPress does not support non-ascii characters in the domain name
#45
@
10 years ago
Just noticed this ticket. Requests includes an IDNA encoder written from scratch by myself. It also has 97.9% test coverage; (relevant tests). Might be worth integrating?
wp_sanitize_redirect() is affected by this issue, too.