Opened 19 years ago
Closed 12 years ago
#1597 closed defect (bug) (fixed)
balanceTags() doesn't filter self-closing tags which shouldn't be self-closed
Reported by: | frenzie | Owned by: | nacin |
---|---|---|---|
Milestone: | 3.5 | Priority: | normal |
Severity: | minor | Version: | 1.5.2 |
Component: | Formatting | Keywords: | has-patch 3.4-early commit |
Focuses: | Cc: |
Description
The current implementation leaves any tag which is self-closed through, of which <strong/>, <em/> and the like can do most harm for the eye, however it's not pleasant in any case.
It is unlikely that those who know these tags will make a mistake with it, but do not underestimate typos or people who wish to do harm.
As a quick and dirty fix, simply replace the following piece of code which leaves any self-closing tag through.
// If self-closing or '', don't do anything. if((substr($regex[2],-1) == '/') || ($tag == '')) { }
This is the replacement code, which will check wether it's one of the known empty tags.
// If self-closing, check wether it's allowed to be self-closing. if (substr($regex[2],-1) == '/') { //if not one of the self-closing tags, remove the '/' if ((substr($regex[2], 0, 2) != ('br' || 'hr')) || (substr($regex[2], 0, 3) != ('img')) || (substr($regex[2], 0, 5) != ('input'))) { $regex[2] = substr($regex[2], 0, -1); } } // As we just got rid of self-closing tags which shouldn't be allowed so, if self-closing or empty, do nothing. if ((substr($regex[2],-1) == '/') || ($tag == '')) { }
This code cannot be integrated in the if statement which does nothing, because then it will not further deal with it, leaving an open <strong>, which is arguably just as bad as the original situation.
Attachments (3)
Change History (29)
#2
@
19 years ago
I think that <p/> would be the same as <p></p>, which won't be bad on the eye (for sure) nor incorrect XHTML (not sure, you'd have to check the specs), however you wouldn't want to see that I guess. Either way, I see no reason to leave it through and using this code it would be filtered out.
Those who do wish to leave it through could make it an exception or something like that...
#8
@
18 years ago
- Keywords balance tags formatting needs-patch removed
- Milestone 2.2 deleted
- Resolution set to invalid
- Status changed from assigned to closed
<strong/>, <p/> and etc are *NOT* valid XHTML.
http://www.w3.org/TR/xhtml1/#guidelines
Marking as invalid.
#9
@
18 years ago
- Resolution invalid deleted
- Status changed from closed to reopened
Err, my bad. Misread the ticket.
#10
follow-up:
↓ 11
@
18 years ago
- Resolution set to wontfix
- Status changed from reopened to closed
Not WordPress' purview. It's up to the user, IMO.
#11
in reply to:
↑ 10
@
15 years ago
- Resolution wontfix deleted
- Status changed from closed to reopened
- Version changed from 1.5.2 to 2.9.1
Replying to rob1n:
Not WordPress' purview. It's up to the user, IMO.
In Opera it doesn't matter because the STRONG (or whatever) elements will be closed in the DOM when the containing element (i.e. typically LI for comments) is closed, but meanwhile I could still severely disrupt the display of comments for users of Firefox, IE or other browsers that parse things along the same lines.
Indeed if the user were only the author(s) I wouldn't disagree with you, but the "user" includes random, anonymous commenters. Should they really be allowed free reign, if only for the few days it takes before their comment is deleted and/or edited (depending on malicious vs. unintentional as a value judgment by the person who has to decide what to do). I think not.
The code still exists on line 1045 of formatting.php today, so the quick and dirty fix I wrote all these years ago could still be applied instantly to protect against such abuse.
#12
@
15 years ago
- Keywords needs-refresh added
- Version changed from 2.9.1 to 1.5.2
Would need a real patch, and one that accounts for all properly self-closing tags.
#14
@
14 years ago
- Keywords has-patch added; needs-refresh removed
By way of some research for self-closing tags:
For XHTML 1.0, see http://www.w3.org/TR/xhtml1/dtds.html#a_dtd_XHTML-1.0-Transitional and search for "EMPTY>".
area, base, basefont, br, col, frame, hr, img, input, isindex (deprecated), link, meta, param
For HTML5, see http://dev.w3.org/html5/html-author/ and search for "End tag: empty".
area, base, br, col, command, embed, hr, img, input, link, meta, param, source
For HTML4.01, see http://www.w3.org/TR/REC-html40/sgml/dtd.html and search for "EMPTY"
area, base, br, col, frame, hr, img, input, link, meta, param
Currently we only support/recognize:
br, hr, img, input
The included patch now supports the superset of all above-mentioned tags:
area, base, basefont, br, col, command, embed, frame, hr, img, input, isindex, link, meta, param, source
Here's a poor man's unit test. Sample post content:
Here are some samples: A <strong class="something" style="text-align:bold;" /> phrase which shouldn't become bold. An <em/> unspaced invalid single-tag, which shouldn't italicize anything. A known tag <br> that didn't close itself. Even with <br class="break"> attributes. Both cases should see line breaks. A known tag <hr/> and with space <hr /> and with attributes: <hr class="heavy"/> or <hr class="light" />. Spacing makes no difference here.
Without the patch, and with tag balancing enabled, the above content is considered perfectly balanced and is unmodified (despite invalid self-closing tags). Most browsers will interpret the 'strong' and 'em' as being open tags, and without any matching closing tags, visual havoc is wrecked on the rest of the page on the front-end, in addition to not validating.
After patching, however, the content is modified to:
Here are some samples: A <strong class="something" style="text-align:bold;" ></strong> phrase which shouldn't become bold. An <em ></em> unspaced invalid single-tag, which shouldn't italicize anything. A known tag <br /> that didn't close itself. Even with <br class="break"/> attributes. Both cases should see line breaks. A known tag <hr /> and with space <hr /> and with attributes: <hr class="heavy"/> or <hr class="light" />. Spacing makes no difference here.
Of particular note is that force_balance_tags()
is enabled by default for visitor comments, meaning for some themes (including Twenty Ten) in most browsers (at least Safari 4.1, FF 3.6+), a commenter could easily make a comment that would screw up the appearance of anything that came after it.
I know mucking with force_balance_tags()
can cause some unease, but as some background on the matter: I was responsible for a number of fix-ups to the function back in 2004 (for #53, which got committed in [2057]). This includes the implementation of the existing self-closing tag support. I don't think much has been done to the function since then.
That said, testers welcome!
(As an aside: I could have sworn I provided a ticket + patch at a later time to more thoroughly cover other self-closing tags, but it never got committed and I never pursued it -- can't find it though. Regardless, this is more thorough and current.)
#15
@
14 years ago
I don't think base, basefont, frame, isindex, link, and meta are really the kind of elements that this function typically presides over? All of those except the FRAME element are children of the HEAD element. Although of course if it did run over those elements for some reason you wouldn't want it to incorrectly turn them into non-self-closing tags... ah never mind, senseless criticism. It all seems to be working fine, but I'll see if I can come up with some better tests tomorrow or over the weekend.
#16
@
14 years ago
I wanted to err on the side of thoroughness. I don't believe it's the function's purpose to be aware of context or anything DTD-related except for tags that are allowed to be self-closing. If implemented as per the patch, someone could conceivable use it to properly tag-balance HTML intended to be output in the HEAD.
Encountering a HEAD element in the BODY is already invalid in and of itself, so tag-balancing those misplaced tags is at least one step in the direction of properness. At the very least it does no additional harm, but allows the function to be used for HTML intended for the HEAD.
#18
@
13 years ago
Patch still applies cleanly.
Incidentally, specifically with TwentyEleven, the tag balancing bug causes all subsequent comments to lose a majority of their styling (the background color, speech bubble effect, commenter avatar display) (at least in Chrome) in addition to being able to make everything that follows bold and/or italicized. (These specific lost stylings don't necessarily affect all themes, but as I stated earlier, all themes can be aesthetically mangled.)
#20
@
13 years ago
Attached 1597.2.diff which refreshes the previous patch so that it applies cleanly against trunk, in addition to a couple of minor tweaks to avoid an unnecessary space in the output under certain conditions.
Unit tests will be shortly forthcoming.
I've seen people use <p /> as self-closing before... though I don't know how correct that is.