Make WordPress Core

Opened 17 years ago

Closed 16 years ago

#4570 closed defect (bug) (fixed)

Comment link cannot contain IRIs

Reported by: link92's profile link92 Owned by: nbachiyski's profile nbachiyski
Milestone: 2.7 Priority: high
Severity: critical Version: 2.0.10
Component: General Keywords: has-patch tested
Focuses: Cc:


If you try and create a comment with a link such as "http://www.詹姆斯.com/" it is rewritten to "", which isn't overly useful.

Attachments (1)

4570.diff (1.5 KB) - added by guillep2k 16 years ago.

Download all attachments as: .zip

Change History (20)

#1 @Nazgul
17 years ago

  • Milestone set to 2.4 (future)

#2 @josephscott
17 years ago

  • Cc josephscott added

I've confirmed in -trunk that if you put the example link above in the URL field of the comment form that strips the domain out. I'll see what I can do to track it down.

#3 @foolswisdom
17 years ago

  • Priority changed from normal to high
  • Severity changed from normal to critical

#4 @ryan
17 years ago

clean_url() doesn't handle IRIs.

#5 @josephscott
17 years ago

The function that is eating this is clean_url() in wp-includes/formatting.php. It doesn't support characters outside of US-ASCII (as Ryan mentioned). I didn't see and obvious/easy/well tested solution to this issue and given the potential risks (we have to filter URLs) this will have stay a 2.4 target at this point I think.

#6 @ryan
17 years ago

I'll add a filter to clean_url() so that some someone can write a plugin as a stop-gap while we figure out the best approach.

#7 @ryan
17 years ago

(In [5952]) Add clean_url filter. see #4570

#8 @ryan
17 years ago

Oops, accidentally committed a taxonomy change along with that. Ignore that, formatting.php has the change relevant to this ticket.

#9 @ryan
17 years ago

Added clean_url filter that accepts the cleaned url and the original url as args. A plugin can take the original url, filter it in an IRI friendly way, and pass it back.

#10 @darkfate
16 years ago

  • Resolution set to fixed
  • Status changed from new to closed

#11 @westi
16 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Not sure this is currently fixed - just filterable leaving open for now.

#12 @yogipress
16 years ago

did we resolve this?

if not,
do we want to create a function is_iri( ) to check before we take original url then filter it?

#13 @nbachiyski
16 years ago

The problem is not only with IRIs, the current scheme doesn't allow other widely used characters, like parens (#5668). IMHO we should allow any characters and just escape them accordingly -- #5663.

#14 @lloydbudd
16 years ago

  • Keywords needs-patch added
  • Milestone changed from 2.5 to 2.6
  • Owner changed from anonymous to nbachiyski
  • Status changed from reopened to new

#15 @guillep2k
16 years ago

By RFC 3987 (, the unicode characters outside the US-ASCII repertoire are not reserved, and therefore cannot be used for syntactical purposes. This should make them suitable for any URL and shouldn't pose a security problem (at least I hope so!).

I don't see how Wordpress can process these characters if it's not set up for using UTF-8, but anyway here's a patch. It should be thoroughly tested, though, because I had to modify the make_clickable() presentation filter as well (used to filter comment_text when displaying comments).

16 years ago

#16 @guillep2k
16 years ago

  • Cc guillep2k added
  • Keywords has-patch added; needs-patch removed

#17 @guillep2k
16 years ago

  • Keywords needs-testing added

#18 @nbachiyski
16 years ago

  • Keywords tested added; needs-testing removed
  • Milestone changed from 2.9 to 2.7

I wrote a couple of tests (search for test_iri) and the proposed patch passes them and all the previous make_clickable() tests pass too.

#19 @ryan
16 years ago

  • Resolution set to fixed
  • Status changed from new to closed

(In [8525]) Allow IRIs. Props guillep2k. fixes #4570

Note: See TracTickets for help on using tickets.