Make WordPress Core

Opened 12 years ago

Closed 5 years ago

#22625 closed defect (bug) (wontfix)

Lone less than (<) characters foul up balanceTags()

Reported by: markjaquith's profile markjaquith Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: Formatting Keywords: has-patch has-unit-tests
Focuses: Cc:

Description

  1. Turn on tag balancing.
  2. Try this text:
    This is < That.
    
    <blockquote>foo</blockquote>
    
    Text <a href="#">link</a>. Post-link text.
    
  3. On save, the ending blockquote tag is stripped, and the subsequent text is blockquoted.

We should recognize lone < characters and entity encode them like we do for <3.

Attachments (7)

balanceTags-23001.patch (1.9 KB) - added by devesine 12 years ago.
balanceTags-unit-1162.patch (951 bytes) - added by devesine 12 years ago.
balanceTags-unit-1162-2.patch (1.0 KB) - added by devesine 12 years ago.
balanceTags-23001-2.patch (958 bytes) - added by devesine 12 years ago.
balanceTags-unit-1162-3.patch (1.4 KB) - added by devesine 12 years ago.
-3 is a slightly more extensive unit test for dangling < handling.
balanceTags-25012.patch (958 bytes) - added by devesine 11 years ago.
22625.patch (2.4 KB) - added by chriscct7 9 years ago.

Download all attachments as: .zip

Change History (16)

#1 in reply to: ↑ description @devesine
12 years ago

Here's a unit test for this, as well as the start of a possible patch to provide this functionality. Seriously malformed things like

< blockquote > < foo < blockquote >

still get mangled to the amusing

&lt; &lt; foo < blockquote >

with that patch, which is obviously wrong - in part, because this patch only handles a single lone < leading a tag - but I wanted to put what I had done up.

I'm not sure what the expected behavior might be where floating '>' are concerned, either.

#2 follow-ups: @azaozz
12 years ago

Perhaps extend the <3 regex to search for any < that don't look like html tag start?

#3 in reply to: ↑ 2 @devesine
12 years ago

Replying to azaozz:

Perhaps extend the <3 regex to search for any < that don't look like html tag start?

Hm, that seems like it would be very inefficient, but that gives me an idea - it looks like we could change the tag regex, currently

"/<(\/?[\w:]*)\s*([^>]*)>/"

to require that a tag be present, like

"/<(\/?[\w:]+)\s*([^>]*)>/"

(and encode any < left sitting in the before-the-tag text). That seems to work fine (and is much simpler), demonstrated in the -2 patch and unit test.

#4 in reply to: ↑ 2 @devesine
12 years ago

As discussed in IRC, performance numbers over 100,000 trials (running against the HTML from http://wordpress.org/), using the following code:

$source = file_get_contents('http://wordpress.org/');
$i = 0;
$ts_arr = array();
while ($i < 100000) {
	unset($foo);
	$ts = microtime(true);
	$foo = balanceTags($source);
	$end_ts = microtime(true);
	$ts_arr[] = $end_ts - $ts;
	$i += 1;
}
echo "Min: " . min($ts_arr) . PHP_EOL;
echo "Max: " . max($ts_arr) . PHP_EOL;
echo "Mean: " . (array_sum($ts_arr) / count($ts_arr)) . PHP_EOL;

Original:

Min: 4.9829483032227E-5
Max: 0.0089290142059326
Mean: 5.6982216835022E-5

-2 patch:

Min: 4.9829483032227E-5
Max: 0.0017969608306885
Mean: 5.7010720252991E-5

Looks like the performance is pretty solidly comparable.

@devesine
12 years ago

-3 is a slightly more extensive unit test for dangling < handling.

#5 @devesine
12 years ago

  • Keywords has-patch added

#6 @devesine
11 years ago

Refreshed against trunk: attachment:balanceTags-25012.patch

attachment:balanceTags-unit-1162-3.patch (to add unit tests) still applies cleanly.

#7 @nacin
11 years ago

  • Component changed from General to Formatting
  • Keywords has-unit-tests added
  • Milestone changed from Awaiting Review to Future Release

@chriscct7
9 years ago

#8 @chriscct7
9 years ago

  • Keywords close added

balanceTags is now removed from new installs though, so marking for close consideration.

New patch refreshes and combines the two old ones.

#9 @pento
5 years ago

  • Keywords close removed
  • Resolution set to wontfix
  • Status changed from new to closed

Vale, balanceTags. You were a cute option, but have long been superseded by more accurate tag wrangling.

Note: See TracTickets for help on using tickets.