#6562 closed defect (bug) (fixed)
Visual Editor preserves multiple sequential spaces, fouls up shortcode parsing
Reported by: |
|
Owned by: | |
---|---|---|---|
Milestone: | 3.7 | Priority: | normal |
Severity: | normal | Version: | 2.5 |
Component: | TinyMCE | Keywords: | commit has-patch |
Focuses: | Cc: |
Description
The Visual Editor (TinyMCE) preserves multiple sequential spaces by turning the second (and following) spaces into some sort of funky invisible hard-space character. Unfortunately, this character is NOT matched by ' ' (space) or '\s' (whitespace) in PCRE (preg_*() functions).
Example:
Put the following into the HTML editor on a post with uploaded images.
[gallery orderby= "RAND()"]
Save the post, then preview. Multiple refreshes should shuffle the images around randomly, because you're ordering by RAND().
Now switch to the visual editor, and insert the same thing, with all the superfluous spaces, and save. Preview the post -- you won't get random images, because the funky TinyMCE pseudo-space characters prevent the regex from parsing the attributes of your shortcode.
Proposed solution: turn off TinyMCE's whitespace preservation code when within square brackets.
Attachments (2)
Change History (22)
#2
@
17 years ago
- Keywords has-patch added
There are couple of characters that are valid whitespace in HTML, but are not allowed in PCRE patterns and not matched by \s. Normalizing the white space coming from HTML seems the better alternative.
#6
@
17 years ago
- Resolution fixed deleted
- Status changed from closed to reopened
I created a unit test for this and it fails. It could be an error in my test but I suspect the patch doesn't work (or at least doesn't work in some cases or some environments).
See test_utf8_whitespace() here: http://svn.automattic.com/wordpress-tests/wp-testcase/test_shortcode.php
I've tried with and without the ascii spaces in addition to utf-8 whitespace sequences. Either way the shortcode runs but the attribute array is empty. $text is correct entering shortcode_parse_atts() but blank after the preg_replace line added in the patch.
Can someone review the test please and either show me a correct input string if I have it wrong, or confirm that it fails?
#7
@
17 years ago
There's a known problem in PCRE (the library, not the php interface) with patterns containing 0x00 (\x00):
http://bugs.php.net/bug.php?id=16590
Perhaps that gets triggered when passing the string. Seems to work well when the string contains the actual UTF-8 characters.
#9
@
17 years ago
azaozz, that bug may well be relevant but it says it occurs when the pattern contains 0x00, not the subject string. To see if 0x00 was messing up the test I split the case in two, test_utf8_whitespace_1() and test_utf8_whitespace_2(). The second test doesn't contain 0x00 at all. Both fail, though in different ways.
The first (with the 00a0 sequence) produces empty attributes. The second has attributes, but the whole \x20\x0babc=\"def\"
string is considered a value, rather than parsed.
Can anyone confirm one way or the other? I'm still not sure if I'm testing it wrong or if the bug is still present.
#10
@
17 years ago
Just to clarify: I don't think this patch does any harm, but I'm not convinced that it reliably fixes the bug.
#11
@
17 years ago
I don't think the sequences are generating the proper UTF characters. For example the U+00A0 would be \xC2\xA0 (http://www.fileformat.info/info/unicode/char/00a0/index.htm), or better to type the actual character in the test string.
The patch does exactly what it says: normalizes the white space characters that are valid in HTML but not in PCRE patterns.
On the other hand since this could be typed by the users, we should probably filter it further, something like:
$text = preg_replace("/[^\x20-\x7f\t\n\r]+/", "", $text); // ascii only
This will sanitize the user input and remove the non-compatible white space characters too.
#15
@
16 years ago
- Resolution set to fixed
- Status changed from reopened to closed
I think this is working as expected.
#16
@
13 years ago
- Milestone changed from 2.9 to Awaiting Review
- Resolution fixed deleted
- Status changed from closed to reopened
I am re-opening this as the unit tests from tellyworth continue to fail. As he said above: "I don't think this patch does any harm, but I'm not convinced that it reliably fixes the bug."
We should come to a conclusion on the scope of this ticket and whether the tests are valid.
#17
@
12 years ago
I think this is fixed now due to the line
$text = preg_replace("/[\x{00a0}\x{200b}]+/u", " ", $text);
in shortcodes.php
#18
@
12 years ago
- Keywords commit has-patch added; needs-patch removed
Core is doing things correctly. The unit tests are wrong.
attachment:6562-ut.diff fixes the tests, both of which pass with current trunk.
Alternative solution:
str_replace()
that hard space character out inshortcode_parse_atts()
before parsing.