Opened 12 years ago
Closed 5 years ago
#22625 closed defect (bug) (wontfix)
Lone less than (<) characters foul up balanceTags()
Reported by: | markjaquith | Owned by: | |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | normal | Version: | |
Component: | Formatting | Keywords: | has-patch has-unit-tests |
Focuses: | Cc: |
Description
- Turn on tag balancing.
- Try this text:
This is < That. <blockquote>foo</blockquote> Text <a href="#">link</a>. Post-link text.
- On save, the ending blockquote tag is stripped, and the subsequent text is blockquoted.
We should recognize lone <
characters and entity encode them like we do for <3
.
Attachments (7)
Change History (16)
#1
in reply to:
↑ description
@
12 years ago
#2
follow-ups:
↓ 3
↓ 4
@
12 years ago
Perhaps extend the <3
regex to search for any < that don't look like html tag start?
#3
in reply to:
↑ 2
@
12 years ago
Replying to azaozz:
Perhaps extend the
<3
regex to search for any < that don't look like html tag start?
Hm, that seems like it would be very inefficient, but that gives me an idea - it looks like we could change the tag regex, currently
"/<(\/?[\w:]*)\s*([^>]*)>/"
to require that a tag be present, like
"/<(\/?[\w:]+)\s*([^>]*)>/"
(and encode any < left sitting in the before-the-tag text). That seems to work fine (and is much simpler), demonstrated in the -2 patch and unit test.
#4
in reply to:
↑ 2
@
12 years ago
As discussed in IRC, performance numbers over 100,000 trials (running against the HTML from http://wordpress.org/), using the following code:
$source = file_get_contents('http://wordpress.org/'); $i = 0; $ts_arr = array(); while ($i < 100000) { unset($foo); $ts = microtime(true); $foo = balanceTags($source); $end_ts = microtime(true); $ts_arr[] = $end_ts - $ts; $i += 1; } echo "Min: " . min($ts_arr) . PHP_EOL; echo "Max: " . max($ts_arr) . PHP_EOL; echo "Mean: " . (array_sum($ts_arr) / count($ts_arr)) . PHP_EOL;
Original:
Min: 4.9829483032227E-5 Max: 0.0089290142059326 Mean: 5.6982216835022E-5
-2 patch:
Min: 4.9829483032227E-5 Max: 0.0017969608306885 Mean: 5.7010720252991E-5
Looks like the performance is pretty solidly comparable.
#6
@
11 years ago
Refreshed against trunk: attachment:balanceTags-25012.patch
attachment:balanceTags-unit-1162-3.patch (to add unit tests) still applies cleanly.
#7
@
11 years ago
- Component changed from General to Formatting
- Keywords has-unit-tests added
- Milestone changed from Awaiting Review to Future Release
Here's a unit test for this, as well as the start of a possible patch to provide this functionality. Seriously malformed things like
still get mangled to the amusing
with that patch, which is obviously wrong - in part, because this patch only handles a single lone < leading a tag - but I wanted to put what I had done up.
I'm not sure what the expected behavior might be where floating '>' are concerned, either.