#14993 closed defect (bug) (fixed)
make_clickable issue with ! character in URL
Reported by: | lancewillett | Owned by: | westi |
---|---|---|---|
Milestone: | 3.1 | Priority: | high |
Severity: | critical | Version: | 3.0.1 |
Component: | Formatting | Keywords: | has-patch |
Focuses: | Cc: |
Description
This comes to light with recent Twitter changes to how tweet permalinks work. Example: http://twitter.com/#!/wordpress/status/25907440233
When filtered with make_clickable
the URL result stops at the # character. Resulting HTML:
<a href="http://twitter.com/#" rel="noreferrer nofollow">http://twitter.com/#</a> !/wordpress/status/25907440233
Adding ! to the pattern would avoid this.
Attachments (6)
Change History (51)
#2
@
14 years ago
This pattern will probably become increasingly popular: http://code.google.com/web/ajaxcrawling/docs/specification.html
#4
@
14 years ago
- Owner set to westi
- Status changed from new to accepted
We have some good test cases for this function so should be easy to test changes
I think we should require something after the ! though so as to not link in Vipers example
If only we could require email style url wrapping in <
and >
;-)
#6
follow-up:
↓ 7
@
14 years ago
This is a bug in wordpress. URL is defined by RFC3986 and the "!" character is part of the reserved group which makes it immune to transition. Simply said: It's part of the URL - in any case.
Space is not btw, which means, that if it's followed by space it can be both: an exclamation mark and the end of a sentence.
As we already do treat other chars of the reserved group as part of the URL (e.g. "?"), "!" should be treated the same.
If the discussion in this ticket changes some minds that treating characters of the reserved group differently as we currently do in make_clickable, then the behaviour should be changed for the other reserved characters as well.
For any decision, this will break backwards-compability.
Reserved characters are:
reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
A nice test-text:
Did it appear on http://www.google.com/? Yes, it did appear on http://www.google.com/!
Example of the current implementation on wordpress.com: Link.
#8
@
14 years ago
About what the users are concerned is that ! is not linked (link ends prior to !) I think we should put that in because otherwise twitter users will spam trac all the time as they see the bug. See: #15070.
#9
@
14 years ago
- Cc dossy@… added
I think the behavior of least surprise is to treat "reserved" characters per RFC3986 as part of the URL *if and only if* it isn't followed by whitespace. This is necessary but not sufficient, as what would happen with this URL would be surprising:
The URL would be treated as "http://www.google.com/?" - since "!" is followed by whitespace. However, did the user intend the URL to be "http://www.google.com/" followed by "?!" as punctuation? Or, should both the "?!" have been part of the linked URL?
This kind of "intention inference" is really tough. I think there will always be cases where it "does the wrong thing" (not what the author intended) ...
#10
@
14 years ago
- Keywords needs-patch added
If someone wants to get this in for 3.1, please post a patch now, as freeze is right around the corner.
#11
@
14 years ago
- Priority changed from normal to high
- Severity changed from normal to major
With http://code.google.com/web/ajaxcrawling/docs/specification.html and adoption especially by Twitter, I'm upping priority and severity on this. Tempted to say this is a must for 3.1. Leaving it as a bug, cause that's what it is.
#14
@
14 years ago
Looked deeply into this issue as well as the standards and it's much more about not having the proper URL matching pattern in make_clickable() rather than Twitter's new URLs.
Currently everything following the protocol part is matched with a single expression which is not correct. We may want to open another issue for getting RFC 3986 compliant by matching part specific patterns. Once we are standards compliant, we can add the two additonal layers on the top of it that we have right now, specifically XSS protection and logic for trailing periods, commas etc.
For a quick fix, submitting a patch very soon (including test cases) that adds the missing exclamation mark to the current pattern. This will avoid Twitter specific bug reports in 3.1.
Will submit a ticket for proposing standards compliance as well.
#19
@
14 years ago
- Keywords has-patch added; needs-patch removed
make-clickable-twitter.14993.diff
makes those kinds of URLs clickable and passes the WP unit tests as well as some from the other times we've dealt with make_clickable
before (such as #10990).
#21
follow-up:
↓ 23
@
14 years ago
TestMakeClickable.php
is what I use to test make_clickable
, with tests from the WP test suite and some gleaned from those other tickets.
Note that it says it's testing make_clickable
but really it's just testing the making-clickableness for the http protocol, not the others.
You will need to change the include lines to point to your respective WP trunk files, then run:
phpunit --verbose TestMakeClickable
#23
in reply to:
↑ 21
@
14 years ago
Replying to filosofo:
TestMakeClickable.php
is what I use to testmake_clickable
, with tests from the WP test suite and some gleaned from those other tickets.
Note that it says it's testing
make_clickable
but really it's just testing the making-clickableness for the http protocol, not the others.
You will need to change the include lines to point to your respective WP trunk files, then run:
phpunit --verbose TestMakeClickable
Going to add the extra test cases to the WP suite thanks :-)
#24
@
14 years ago
Opened a ticket for merging these extra test cases and merged a few of them.
http://unit-tests.trac.wordpress.org/ticket/8
Patches for extra test cases always welcome :-)
#25
follow-up:
↓ 26
@
14 years ago
Committed filosofo's patch in [16111]
Added extra tests in http://unit-tests.trac.wordpress.org/changeset/314
It would still be nice to not link in the ! if after a url for emphasis as discussed above.
#26
in reply to:
↑ 25
@
14 years ago
Replying to westi:
Committed filosofo's patch in [16111]
Added extra tests in http://unit-tests.trac.wordpress.org/changeset/314
It would still be nice to not link in the ! if after a url for emphasis as discussed above.
OK, make-clickable-twitter.14993.2.diff
makes it so the RFC 3986 sub-delims are ignored on either end of a URL, such as
This is a really good tweet http://twitter.com/#!/wordpress/status/25907440233!
except for a closing parenthesis ")" on the end, because from the WP tests the following is supposed to be the entire URL:
http://en.wikipedia.org/wiki/PC_Tools_(Central_Point_Software)
This nods to common practice rather than specification, and changing it would break backwards-compatibility.
#28
@
14 years ago
- Resolution fixed deleted
- Status changed from closed to reopened
This chomps image urls and breaks shortcodes reverting.
#30
follow-up:
↓ 31
@
14 years ago
Can you make test cases for those failures, or at least paste examples here so I can see what you mean?
#31
in reply to:
↑ 30
@
14 years ago
Replying to filosofo:
Can you make test cases for those failures, or at least paste examples here so I can see what you mean?
Yes I am working on those.
#32
@
14 years ago
http://unit-tests.trac.wordpress.org/changeset/315
Summary - urls quoted with single quotes were being made clickable which broke use as attributes of existing HTML elements :-)
#35
in reply to:
↑ 34
@
14 years ago
- Resolution fixed deleted
- Status changed from closed to reopened
#39
in reply to:
↑ 38
@
14 years ago
Replying to filosofo:
Any more information available on this?
Sorry I forgot to come back and upload the test data :-(
Here it is
#40
@
14 years ago
gdb backtrace from the segfault:
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xb6f7f6d0 (LWP 8029)] match ( eptr=0x9e71014 "][/url][url=http://www.streamate.com/?DF=0&AFNO=1-0-605655-342720&UHNSMTY=303][img]http://won.images.streamray.com/images/streamray/won/jpg/jess19_150.jpg[/img][/url][url=http://www.streamate."..., ecode=0xa88e8dc "M", mstart=0x9e702c9 "http://www.streamate.com/?DF=0&AFNO=1-0-605655-342720&UHNSMTY=303][img]http://won.images.streamray.com/images/streamray/won/jpg/natalybaby_150.jpg[/img][/url][url=http://www.streamate.com/?DF="..., offset_top=2, md=0xbfb92df0, ims=5, eptrb=0x0, flags=0, rdepth=6795) at pcre_exec.c:405 405 pcre_exec.c: No such file or directory. in pcre_exec.c (gdb) bt #0 match ( eptr=0x9e71014 "][/url][url=http://www.streamate.com/?DF=0&AFNO=1-0-605655-342720&UHNSMTY=303][img]http://won.images.streamray.com/images/streamray/won/jpg/jess19_150.jpg[/img][/url][url=http://www.streamate."..., ecode=0xa88e8dc "M", mstart=0x9e702c9 "http://www.streamate.com/?DF=0&AFNO=1-0-605655-342720&UHNSMTY=303][img]http://won.images.streamray.com/images/streamray/won/jpg/natalybaby_150.jpg[/img][/url][url=http://www.streamate.com/?DF="..., offset_top=2, md=0xbfb92df0, ims=5, eptrb=0x0, flags=0, rdepth=6795) at pcre_exec.c:405 #1 0xb74dbfd8 in match ( eptr=0x9e71014 "][/url][url=http://www.streamate.com/?DF=0&AFNO=1-0-605655-342720&UHNSMTY=303][img]http://won.images.streamray.com/images/streamray/won/jpg/jess19_150.jpg[/img][/url][url=http://www.streamate."..., ecode=0xa88e8d9 "]", mstart=0x9e702c9 "http://www.streamate.com/?DF=0&AFNO=1-0-605655-342720&UHNSMTY=303][img]http://won.images.streamray.com/images/streamray/won/jpg/natalybaby_150.jpg[/img][/url][url=http://www.streamate.com/?DF="..., offset_top=2, md=0xbfb92df0, ims=5, eptrb=0x0, flags=0, rdepth=6794) at pcre_exec.c:749 #2 0xb74e0d22 in match ( eptr=0x9e71014 "][/url][url=http://www.streamate.com/?DF=0&AFNO=1-0-605655-342720&UHNSMTY=303][img]http://won.images.streamray.com/images/streamray/won/jpg/jess19_150.jpg[/img][/url][url=http://www.streamate."..., ecode=0xa88e9c1 "U", mstart=0x9e702c9 "http://www.streamate.com/?DF=0&AFNO=1-0-605655-342720&UHNSMTY=303][img]http://won.images.streamray.com/images/streamray/won/jpg/natalybaby_150.jpg[/img][/url][url=http://www.streamate.com/?DF="..., offset_top=2, md=0xbfb92df0, ims=5, eptrb=0x0, flags=0, rdepth=6793) at pcre_exec.c:1289 #3 0xb74dbfd8 in match ( eptr=0x9e71013 "g][/url][url=http://www.streamate.com/?DF=0&AFNO=1-0-605655-342720&UHNSMTY=303][img]http://won.images.streamray.com/images/streamray/won/jpg/jess19_150.jpg[/img][/url][url=http://www.streamate"..., ecode=0xa88e8d9 "]", mstart=0x9e702c9 "http://www.streamate.com/?DF=0&AFNO=1-0-605655-342720&UHNSMTY=303][img]http://won.images.streamray.com/images/streamray/won/jpg/natalybaby_150.jpg[/img][/url][url=http://www.streamate.com/?DF="..., offset_top=2, md=0xbfb92df0, ims=5, eptrb=0x0, flags=0, rdepth=6792) at pcre_exec.c:749
#41
@
14 years ago
- Keywords needs-patch added; has-patch removed
- Severity changed from major to critical
Bumping again.
#42
@
14 years ago
- Keywords has-patch added; needs-patch removed
make-clickable-ultimate.14993.diff
passes every clickable test that I know of, and it should not cause a segmentation fault for texts like westi's example.
#43
@
14 years ago
I forgot to mention that I added a test to my personal clickable test based on how it fails in this comment. (The commenter later complains because WordPress mishandles the closing parenthesis.)
array of original and expected output:
array( 'In his famous speech “You and Your research” (here: http://www.cs.virginia.edu/~robins/YouAndYourResearch.html) Richard Hamming wrote about people getting more done with their doors closed, but', 'In his famous speech “You and Your research” (here: <a href="http://www.cs.virginia.edu/~robins/YouAndYourResearch.html" rel="nofollow">http://www.cs.virginia.edu/~robins/YouAndYourResearch.html</a>) Richard Hamming wrote about people getting more done with their doors closed, but', ),
The drawback would be this:
Check out http://wordpress.org!
I'm not sure how the regex works, but it'd be nice if
!
was included only if it had URL characters after it.