WordPress.org

Make WordPress Core

Opened 10 years ago

Closed 3 years ago

#1597 closed defect (bug) (fixed)

balanceTags() doesn't filter self-closing tags which shouldn't be self-closed

Reported by: frenzie Owned by: nacin
Milestone: 3.5 Priority: normal
Severity: minor Version: 1.5.2
Component: Formatting Keywords: has-patch 3.4-early commit
Focuses: Cc:

Description

The current implementation leaves any tag which is self-closed through, of which <strong/>, <em/> and the like can do most harm for the eye, however it's not pleasant in any case.

It is unlikely that those who know these tags will make a mistake with it, but do not underestimate typos or people who wish to do harm.

As a quick and dirty fix, simply replace the following piece of code which leaves any self-closing tag through.

			// If self-closing or '', don't do anything.
			if((substr($regex[2],-1) == '/') || ($tag == '')) {
			}

This is the replacement code, which will check wether it's one of the known empty tags.

			// If self-closing, check wether it's allowed to be self-closing.
			if (substr($regex[2],-1) == '/') {
				//if not one of the self-closing tags, remove the '/'
				if ((substr($regex[2], 0, 2) != ('br' || 'hr')) || (substr($regex[2], 0, 3) != ('img')) || (substr($regex[2], 0, 5) != ('input'))) {
					$regex[2] = substr($regex[2], 0, -1);
				}
			}
			// As we just got rid of self-closing tags which shouldn't be allowed so, if self-closing or empty, do nothing.
			if ((substr($regex[2],-1) == '/') || ($tag == '')) {
			}

This code cannot be integrated in the if statement which does nothing, because then it will not further deal with it, leaving an open <strong>, which is arguably just as bad as the original situation.

Attachments (3)

1597.diff (2.4 KB) - added by coffee2code 5 years ago.
Aforementioned patch.
1597.2.diff (2.4 KB) - added by coffee2code 3 years ago.
Refreshed patch to apply cleanly against trunk, plus minor tweaks
1597.3.diff (2.4 KB) - added by coffee2code 3 years ago.
Refreshed patch to apply cleanly against trunk.

Download all attachments as: .zip

Change History (29)

comment:1 @markjaquith10 years ago

I've seen people use <p /> as self-closing before... though I don't know how correct that is.

comment:2 @frenzie10 years ago

I think that <p/> would be the same as <p></p>, which won't be bad on the eye (for sure) nor incorrect XHTML (not sure, you'd have to check the specs), however you wouldn't want to see that I guess. Either way, I see no reason to leave it through and using this code it would be filtered out.

Those who do wish to leave it through could make it an exception or something like that...

comment:3 @matt10 years ago

  • Milestone changed from 2.0 to 2.1

comment:4 @rgovostes9 years ago

  • Keywords has-patch added

comment:5 @Nazgul9 years ago

  • Keywords needs-patch added; has-patch removed

comment:6 @matt9 years ago

  • Milestone changed from 2.1 to 2.2

comment:7 @rob1n8 years ago

  • Owner changed from anonymous to rob1n
  • Status changed from new to assigned

comment:8 @rob1n8 years ago

  • Keywords balance tags formatting needs-patch removed
  • Milestone 2.2 deleted
  • Resolution set to invalid
  • Status changed from assigned to closed

<strong/>, <p/> and etc are *NOT* valid XHTML.

http://www.w3.org/TR/xhtml1/#guidelines

Marking as invalid.

comment:9 @rob1n8 years ago

  • Resolution invalid deleted
  • Status changed from closed to reopened

Err, my bad. Misread the ticket.

comment:10 follow-up: @rob1n8 years ago

  • Resolution set to wontfix
  • Status changed from reopened to closed

Not WordPress' purview. It's up to the user, IMO.

comment:11 in reply to: ↑ 10 @Frenzie5 years ago

  • Resolution wontfix deleted
  • Status changed from closed to reopened
  • Version changed from 1.5.2 to 2.9.1

Replying to rob1n:

Not WordPress' purview. It's up to the user, IMO.

In Opera it doesn't matter because the STRONG (or whatever) elements will be closed in the DOM when the containing element (i.e. typically LI for comments) is closed, but meanwhile I could still severely disrupt the display of comments for users of Firefox, IE or other browsers that parse things along the same lines.

Indeed if the user were only the author(s) I wouldn't disagree with you, but the "user" includes random, anonymous commenters. Should they really be allowed free reign, if only for the few days it takes before their comment is deleted and/or edited (depending on malicious vs. unintentional as a value judgment by the person who has to decide what to do). I think not.

The code still exists on line 1045 of formatting.php today, so the quick and dirty fix I wrote all these years ago could still be applied instantly to protect against such abuse.

comment:12 @nacin5 years ago

  • Keywords needs-refresh added
  • Version changed from 2.9.1 to 1.5.2

Would need a real patch, and one that accounts for all properly self-closing tags.

comment:13 @nacin5 years ago

  • Component changed from General to Formatting
  • Milestone set to Future Release

comment:14 @coffee2code5 years ago

  • Keywords has-patch added; needs-refresh removed

By way of some research for self-closing tags:

For XHTML 1.0, see http://www.w3.org/TR/xhtml1/dtds.html#a_dtd_XHTML-1.0-Transitional and search for "EMPTY>".

	area, base, basefont, br, col, frame, hr, img, input, isindex (deprecated), link, meta, param

For HTML5, see http://dev.w3.org/html5/html-author/ and search for "End tag: empty".

	area, base, br, col, command, embed, hr, img, input, link, meta, param, source

For HTML4.01, see http://www.w3.org/TR/REC-html40/sgml/dtd.html and search for "EMPTY"

	area, base, br, col, frame, hr, img, input, link, meta, param

Currently we only support/recognize:

	br, hr, img, input

The included patch now supports the superset of all above-mentioned tags:

	area, base, basefont, br, col, command, embed, frame, hr, img, input, isindex, link, meta, param, source

Here's a poor man's unit test. Sample post content:

Here are some samples:

A <strong class="something" style="text-align:bold;" /> phrase which shouldn't become bold.

An <em/> unspaced invalid single-tag, which shouldn't italicize anything.

A known tag <br> that didn't close itself.  Even with <br class="break"> attributes.  Both cases should see line breaks.

A known tag <hr/> and with space <hr /> and with attributes: <hr class="heavy"/> or <hr class="light" />. Spacing makes no difference here.

Without the patch, and with tag balancing enabled, the above content is considered perfectly balanced and is unmodified (despite invalid self-closing tags). Most browsers will interpret the 'strong' and 'em' as being open tags, and without any matching closing tags, visual havoc is wrecked on the rest of the page on the front-end, in addition to not validating.

After patching, however, the content is modified to:

Here are some samples:

A <strong class="something" style="text-align:bold;" ></strong> phrase which shouldn't become bold.

An <em ></em> unspaced invalid single-tag, which shouldn't italicize anything.

A known tag <br /> that didn't close itself.  Even with <br class="break"/> attributes.  Both cases should see line breaks.

A known tag <hr /> and with space <hr /> and with attributes: <hr class="heavy"/> or <hr class="light" />. Spacing makes no difference here.

Of particular note is that force_balance_tags() is enabled by default for visitor comments, meaning for some themes (including Twenty Ten) in most browsers (at least Safari 4.1, FF 3.6+), a commenter could easily make a comment that would screw up the appearance of anything that came after it.


I know mucking with force_balance_tags() can cause some unease, but as some background on the matter: I was responsible for a number of fix-ups to the function back in 2004 (for #53, which got committed in [2057]). This includes the implementation of the existing self-closing tag support. I don't think much has been done to the function since then.

That said, testers welcome!

(As an aside: I could have sworn I provided a ticket + patch at a later time to more thoroughly cover other self-closing tags, but it never got committed and I never pursued it -- can't find it though. Regardless, this is more thorough and current.)

@coffee2code5 years ago

Aforementioned patch.

comment:15 @Frenzie5 years ago

I don't think base, basefont, frame, isindex, link, and meta are really the kind of elements that this function typically presides over? All of those except the FRAME element are children of the HEAD element. Although of course if it did run over those elements for some reason you wouldn't want it to incorrectly turn them into non-self-closing tags... ah never mind, senseless criticism. It all seems to be working fine, but I'll see if I can come up with some better tests tomorrow or over the weekend.

comment:16 @coffee2code5 years ago

I wanted to err on the side of thoroughness. I don't believe it's the function's purpose to be aware of context or anything DTD-related except for tags that are allowed to be self-closing. If implemented as per the patch, someone could conceivable use it to properly tag-balance HTML intended to be output in the HEAD.

Encountering a HEAD element in the BODY is already invalid in and of itself, so tag-balancing those misplaced tags is at least one step in the direction of properness. At the very least it does no additional harm, but allows the function to be used for HTML intended for the HEAD.

comment:17 @rob1n4 years ago

  • Owner rob1n deleted
  • Status changed from reopened to assigned

comment:18 @coffee2code4 years ago

Patch still applies cleanly.

Incidentally, specifically with TwentyEleven, the tag balancing bug causes all subsequent comments to lose a majority of their styling (the background color, speech bubble effect, commenter avatar display) (at least in Chrome) in addition to being able to make everything that follows bold and/or italicized. (These specific lost stylings don't necessarily affect all themes, but as I stated earlier, all themes can be aesthetically mangled.)

comment:19 @nacin4 years ago

  • Keywords 3.4-early needs-unit-tests added

@coffee2code3 years ago

Refreshed patch to apply cleanly against trunk, plus minor tweaks

comment:20 @coffee2code3 years ago

Attached 1597.2.diff which refreshes the previous patch so that it applies cleanly against trunk, in addition to a couple of minor tweaks to avoid an unnecessary space in the output under certain conditions.

Unit tests will be shortly forthcoming.

comment:21 @coffee2code3 years ago

  • Keywords needs-unit-tests removed

Unit tests patch submitted as #UT31.

comment:22 @nacin3 years ago

  • Milestone changed from Future Release to 3.5

Let's check this out for 3.5.

@coffee2code3 years ago

Refreshed patch to apply cleanly against trunk.

comment:23 @coffee2code3 years ago

Attached 1597.3.diff which refreshes the latest patch so that it applies cleanly against trunk (as of r21426).

comment:24 @nacin3 years ago

  • Keywords commit added

comment:25 @nacin3 years ago

I did some diligence on this in IRC with coffee2code. Looks good to me.

comment:26 @nacin3 years ago

  • Owner set to nacin
  • Resolution set to fixed
  • Status changed from assigned to closed

In [21828]:

When balancing tags, properly close tags that shouldn't be self-closed but are. Support all self-closing tags.

props coffee2code.
fixes #1597.

Note: See TracTickets for help on using tickets.