Opened 13 years ago
Closed 7 years ago
#22625 closed defect (bug) (wontfix)
Lone less than (<) characters foul up balanceTags()
| Reported by: |
|
Owned by: | |
|---|---|---|---|
| Milestone: | Priority: | normal | |
| Severity: | normal | Version: | |
| Component: | Formatting | Keywords: | has-patch has-unit-tests |
| Focuses: | Cc: |
Description
- Turn on tag balancing.
- Try this text:
This is < That. <blockquote>foo</blockquote> Text <a href="#">link</a>. Post-link text.
- On save, the ending blockquote tag is stripped, and the subsequent text is blockquoted.
We should recognize lone < characters and entity encode them like we do for <3.
Attachments (7)
Change History (16)
#1
in reply to:
↑ description
@
13 years ago
#2
follow-ups:
↓ 3
↓ 4
@
13 years ago
Perhaps extend the <3 regex to search for any < that don't look like html tag start?
#3
in reply to:
↑ 2
@
13 years ago
Replying to azaozz:
Perhaps extend the
<3regex to search for any < that don't look like html tag start?
Hm, that seems like it would be very inefficient, but that gives me an idea - it looks like we could change the tag regex, currently
"/<(\/?[\w:]*)\s*([^>]*)>/"
to require that a tag be present, like
"/<(\/?[\w:]+)\s*([^>]*)>/"
(and encode any < left sitting in the before-the-tag text). That seems to work fine (and is much simpler), demonstrated in the -2 patch and unit test.
#4
in reply to:
↑ 2
@
13 years ago
As discussed in IRC, performance numbers over 100,000 trials (running against the HTML from http://wordpress.org/), using the following code:
$source = file_get_contents('http://wordpress.org/');
$i = 0;
$ts_arr = array();
while ($i < 100000) {
unset($foo);
$ts = microtime(true);
$foo = balanceTags($source);
$end_ts = microtime(true);
$ts_arr[] = $end_ts - $ts;
$i += 1;
}
echo "Min: " . min($ts_arr) . PHP_EOL;
echo "Max: " . max($ts_arr) . PHP_EOL;
echo "Mean: " . (array_sum($ts_arr) / count($ts_arr)) . PHP_EOL;
Original:
Min: 4.9829483032227E-5 Max: 0.0089290142059326 Mean: 5.6982216835022E-5
-2 patch:
Min: 4.9829483032227E-5 Max: 0.0017969608306885 Mean: 5.7010720252991E-5
Looks like the performance is pretty solidly comparable.
#6
@
13 years ago
Refreshed against trunk: attachment:balanceTags-25012.patch
attachment:balanceTags-unit-1162-3.patch (to add unit tests) still applies cleanly.
#7
@
12 years ago
- Component changed from General to Formatting
- Keywords has-unit-tests added
- Milestone changed from Awaiting Review to Future Release
Here's a unit test for this, as well as the start of a possible patch to provide this functionality. Seriously malformed things like
still get mangled to the amusing
with that patch, which is obviously wrong - in part, because this patch only handles a single lone < leading a tag - but I wanted to put what I had done up.
I'm not sure what the expected behavior might be where floating '>' are concerned, either.