Ticket #1597 (assigned defect (bug))

Opened 7 years ago

Last modified 4 months ago

balanceTags() doesn't filter self-closing tags which shouldn't be self-closed

Reported by: frenzie Owned by:
Priority: normal Milestone: Future Release
Component: Formatting Version: 1.5.2
Severity: minor Keywords: has-patch 3.4-early needs-unit-tests
Cc:

Description

The current implementation leaves any tag which is self-closed through, of which <strong/>, <em/> and the like can do most harm for the eye, however it's not pleasant in any case.

It is unlikely that those who know these tags will make a mistake with it, but do not underestimate typos or people who wish to do harm.

As a quick and dirty fix, simply replace the following piece of code which leaves any self-closing tag through.

			// If self-closing or '', don't do anything.
			if((substr($regex[2],-1) == '/') || ($tag == '')) {
			}

This is the replacement code, which will check wether it's one of the known empty tags.

			// If self-closing, check wether it's allowed to be self-closing.
			if (substr($regex[2],-1) == '/') {
				//if not one of the self-closing tags, remove the '/'
				if ((substr($regex[2], 0, 2) != ('br' || 'hr')) || (substr($regex[2], 0, 3) != ('img')) || (substr($regex[2], 0, 5) != ('input'))) {
					$regex[2] = substr($regex[2], 0, -1);
				}
			}
			// As we just got rid of self-closing tags which shouldn't be allowed so, if self-closing or empty, do nothing.
			if ((substr($regex[2],-1) == '/') || ($tag == '')) {
			}

This code cannot be integrated in the if statement which does nothing, because then it will not further deal with it, leaving an open <strong>, which is arguably just as bad as the original situation.

Attachments

1597.diff Download (2.4 KB) - added by coffee2code 17 months ago.
Aforementioned patch.

Change History

I've seen people use <p /> as self-closing before... though I don't know how correct that is.

I think that <p/> would be the same as <p></p>, which won't be bad on the eye (for sure) nor incorrect XHTML (not sure, you'd have to check the specs), however you wouldn't want to see that I guess. Either way, I see no reason to leave it through and using this code it would be filtered out.

Those who do wish to leave it through could make it an exception or something like that...

comment:3   matt6 years ago

  • Milestone changed from 2.0 to 2.1
  • Keywords has-patch added
  • Keywords needs-patch added; has-patch removed

comment:6   matt5 years ago

  • Milestone changed from 2.1 to 2.2
  • Owner changed from anonymous to rob1n
  • Status changed from new to assigned
  • Keywords balance tags formatting needs-patch removed
  • Status changed from assigned to closed
  • Resolution set to invalid
  • Milestone 2.2 deleted

<strong/>, <p/> and etc are *NOT* valid XHTML.

 http://www.w3.org/TR/xhtml1/#guidelines

Marking as invalid.

  • Status changed from closed to reopened
  • Resolution invalid deleted

Err, my bad. Misread the ticket.

comment:10 follow-up: ↓ 11   rob1n5 years ago

  • Status changed from reopened to closed
  • Resolution set to wontfix

Not WordPress' purview. It's up to the user, IMO.

comment:11 in reply to: ↑ 10   Frenzie2 years ago

  • Status changed from closed to reopened
  • Version changed from 1.5.2 to 2.9.1
  • Resolution wontfix deleted

Replying to rob1n:

Not WordPress' purview. It's up to the user, IMO.

In Opera it doesn't matter because the STRONG (or whatever) elements will be closed in the DOM when the containing element (i.e. typically LI for comments) is closed, but meanwhile I could still severely disrupt the display of comments for users of Firefox, IE or other browsers that parse things along the same lines.

Indeed if the user were only the author(s) I wouldn't disagree with you, but the "user" includes random, anonymous commenters. Should they really be allowed free reign, if only for the few days it takes before their comment is deleted and/or edited (depending on malicious vs. unintentional as a value judgment by the person who has to decide what to do). I think not.

The code still exists on line 1045 of formatting.php today, so the quick and dirty fix I wrote all these years ago could still be applied instantly to protect against such abuse.

  • Keywords needs-refresh added
  • Version changed from 2.9.1 to 1.5.2

Would need a real patch, and one that accounts for all properly self-closing tags.

  • Component changed from General to Formatting
  • Milestone set to Future Release
  • Keywords has-patch added; needs-refresh removed

By way of some research for self-closing tags:

For XHTML 1.0, see  http://www.w3.org/TR/xhtml1/dtds.html#a_dtd_XHTML-1.0-Transitional and search for "EMPTY>".

	area, base, basefont, br, col, frame, hr, img, input, isindex (deprecated), link, meta, param

For HTML5, see  http://dev.w3.org/html5/html-author/ and search for "End tag: empty".

	area, base, br, col, command, embed, hr, img, input, link, meta, param, source

For HTML4.01, see  http://www.w3.org/TR/REC-html40/sgml/dtd.html and search for "EMPTY"

	area, base, br, col, frame, hr, img, input, link, meta, param

Currently we only support/recognize:

	br, hr, img, input

The included patch now supports the superset of all above-mentioned tags:

	area, base, basefont, br, col, command, embed, frame, hr, img, input, isindex, link, meta, param, source

Here's a poor man's unit test. Sample post content:

Here are some samples:

A <strong class="something" style="text-align:bold;" /> phrase which shouldn't become bold.

An <em/> unspaced invalid single-tag, which shouldn't italicize anything.

A known tag <br> that didn't close itself.  Even with <br class="break"> attributes.  Both cases should see line breaks.

A known tag <hr/> and with space <hr /> and with attributes: <hr class="heavy"/> or <hr class="light" />. Spacing makes no difference here.

Without the patch, and with tag balancing enabled, the above content is considered perfectly balanced and is unmodified (despite invalid self-closing tags). Most browsers will interpret the 'strong' and 'em' as being open tags, and without any matching closing tags, visual havoc is wrecked on the rest of the page on the front-end, in addition to not validating.

After patching, however, the content is modified to:

Here are some samples:

A <strong class="something" style="text-align:bold;" ></strong> phrase which shouldn't become bold.

An <em ></em> unspaced invalid single-tag, which shouldn't italicize anything.

A known tag <br /> that didn't close itself.  Even with <br class="break"/> attributes.  Both cases should see line breaks.

A known tag <hr /> and with space <hr /> and with attributes: <hr class="heavy"/> or <hr class="light" />. Spacing makes no difference here.

Of particular note is that force_balance_tags() is enabled by default for visitor comments, meaning for some themes (including Twenty Ten) in most browsers (at least Safari 4.1, FF 3.6+), a commenter could easily make a comment that would screw up the appearance of anything that came after it.


I know mucking with force_balance_tags() can cause some unease, but as some background on the matter: I was responsible for a number of fix-ups to the function back in 2004 (for #53, which got committed in [2057]). This includes the implementation of the existing self-closing tag support. I don't think much has been done to the function since then.

That said, testers welcome!

(As an aside: I could have sworn I provided a ticket + patch at a later time to more thoroughly cover other self-closing tags, but it never got committed and I never pursued it -- can't find it though. Regardless, this is more thorough and current.)

Aforementioned patch.

I don't think base, basefont, frame, isindex, link, and meta are really the kind of elements that this function typically presides over? All of those except the FRAME element are children of the HEAD element. Although of course if it did run over those elements for some reason you wouldn't want it to incorrectly turn them into non-self-closing tags... ah never mind, senseless criticism. It all seems to be working fine, but I'll see if I can come up with some better tests tomorrow or over the weekend.

I wanted to err on the side of thoroughness. I don't believe it's the function's purpose to be aware of context or anything DTD-related except for tags that are allowed to be self-closing. If implemented as per the patch, someone could conceivable use it to properly tag-balance HTML intended to be output in the HEAD.

Encountering a HEAD element in the BODY is already invalid in and of itself, so tag-balancing those misplaced tags is at least one step in the direction of properness. At the very least it does no additional harm, but allows the function to be used for HTML intended for the HEAD.

  • Owner rob1n deleted
  • Status changed from reopened to assigned

Patch still applies cleanly.

Incidentally, specifically with TwentyEleven, the tag balancing bug causes all subsequent comments to lose a majority of their styling (the background color, speech bubble effect, commenter avatar display) (at least in Chrome) in addition to being able to make everything that follows bold and/or italicized. (These specific lost stylings don't necessarily affect all themes, but as I stated earlier, all themes can be aesthetically mangled.)

  • Keywords 3.4-early needs-unit-tests added
Note: See TracTickets for help on using tickets.