Ticket #4570 (closed defect (bug): fixed)

Opened 5 years ago

Last modified 4 years ago

Comment link cannot contain IRIs

Reported by: link92 Owned by: nbachiyski
Priority: high Milestone: 2.7
Component: General Version: 2.0.10
Severity: critical Keywords: has-patch tested
Cc: josephscott, guillep2k

Description

If you try and create a comment with a link such as " http://www.詹姆斯.com/" it is rewritten to " http://www..com/", which isn't overly useful.

Attachments

4570.diff Download (1.5 KB) - added by guillep2k 4 years ago.

Change History

  • Milestone set to 2.4 (future)
  • Cc josephscott added

I've confirmed in -trunk that if you put the example link above in the URL field of the comment form that strips the domain out. I'll see what I can do to track it down.

  • Priority changed from normal to high
  • Severity changed from normal to critical

comment:4   ryan4 years ago

clean_url() doesn't handle IRIs.

The function that is eating this is clean_url() in wp-includes/formatting.php. It doesn't support characters outside of US-ASCII (as Ryan mentioned). I didn't see and obvious/easy/well tested solution to this issue and given the potential risks (we have to filter URLs) this will have stay a 2.4 target at this point I think.

comment:6   ryan4 years ago

I'll add a filter to clean_url() so that some someone can write a plugin as a stop-gap while we figure out the best approach.

comment:7   ryan4 years ago

(In [5952]) Add clean_url filter. see #4570

comment:8   ryan4 years ago

Oops, accidentally committed a taxonomy change along with that. Ignore that, formatting.php has the change relevant to this ticket.

comment:9   ryan4 years ago

Added clean_url filter that accepts the cleaned url and the original url as args. A plugin can take the original url, filter it in an IRI friendly way, and pass it back.

  • Status changed from new to closed
  • Resolution set to fixed
  • Status changed from closed to reopened
  • Resolution fixed deleted

Not sure this is currently fixed - just filterable leaving open for now.

did we resolve this?

if not, do we want to create a function is_iri( ) to check before we take original url then filter it?

The problem is not only with IRIs, the current scheme doesn't allow other widely used characters, like parens (#5668). IMHO we should allow any characters and just escape them accordingly -- #5663.

  • Keywords needs-patch added
  • Owner changed from anonymous to nbachiyski
  • Status changed from reopened to new
  • Milestone changed from 2.5 to 2.6

By RFC 3987 ( http://tools.ietf.org/html/rfc3987#section-2.1), the unicode characters outside the US-ASCII repertoire are not reserved, and therefore cannot be used for syntactical purposes. This should make them suitable for any URL and shouldn't pose a security problem (at least I hope so!).

I don't see how Wordpress can process these characters if it's not set up for using UTF-8, but anyway here's a patch. It should be thoroughly tested, though, because I had to modify the make_clickable() presentation filter as well (used to filter comment_text when displaying comments).

  • Cc guillep2k added
  • Keywords has-patch added; needs-patch removed
  • Keywords needs-testing added
  • Keywords tested added; needs-testing removed
  • Milestone changed from 2.9 to 2.7

I wrote a  couple of tests (search for test_iri) and the proposed patch passes them and all the previous make_clickable() tests pass too.

  • Status changed from new to closed
  • Resolution set to fixed

(In [8525]) Allow IRIs. Props guillep2k. fixes #4570

Note: See TracTickets for help on using tickets.