Opened 17 years ago
Closed 16 years ago
#4570 closed defect (bug) (fixed)
Comment link cannot contain IRIs
Reported by: | link92 | Owned by: | nbachiyski |
---|---|---|---|
Milestone: | 2.7 | Priority: | high |
Severity: | critical | Version: | 2.0.10 |
Component: | General | Keywords: | has-patch tested |
Focuses: | Cc: |
Description
If you try and create a comment with a link such as "http://www.詹姆斯.com/" it is rewritten to "http://www..com/", which isn't overly useful.
Attachments (1)
Change History (20)
#5
@
17 years ago
The function that is eating this is clean_url() in wp-includes/formatting.php. It doesn't support characters outside of US-ASCII (as Ryan mentioned). I didn't see and obvious/easy/well tested solution to this issue and given the potential risks (we have to filter URLs) this will have stay a 2.4 target at this point I think.
#6
@
17 years ago
I'll add a filter to clean_url() so that some someone can write a plugin as a stop-gap while we figure out the best approach.
#8
@
17 years ago
Oops, accidentally committed a taxonomy change along with that. Ignore that, formatting.php has the change relevant to this ticket.
#9
@
17 years ago
Added clean_url filter that accepts the cleaned url and the original url as args. A plugin can take the original url, filter it in an IRI friendly way, and pass it back.
#11
@
17 years ago
- Resolution fixed deleted
- Status changed from closed to reopened
Not sure this is currently fixed - just filterable leaving open for now.
#12
@
17 years ago
did we resolve this?
if not,
do we want to create a function is_iri( ) to check before we take original url then filter it?
#14
@
17 years ago
- Keywords needs-patch added
- Milestone changed from 2.5 to 2.6
- Owner changed from anonymous to nbachiyski
- Status changed from reopened to new
#15
@
16 years ago
By RFC 3987 (http://tools.ietf.org/html/rfc3987#section-2.1), the unicode characters outside the US-ASCII repertoire are not reserved, and therefore cannot be used for syntactical purposes. This should make them suitable for any URL and shouldn't pose a security problem (at least I hope so!).
I don't see how Wordpress can process these characters if it's not set up for using UTF-8, but anyway here's a patch. It should be thoroughly tested, though, because I had to modify the make_clickable() presentation filter as well (used to filter comment_text when displaying comments).
#18
@
16 years ago
- Keywords tested added; needs-testing removed
- Milestone changed from 2.9 to 2.7
I wrote a couple of tests (search for test_iri) and the proposed patch passes them and all the previous make_clickable()
tests pass too.
I've confirmed in -trunk that if you put the example link above in the URL field of the comment form that strips the domain out. I'll see what I can do to track it down.