Make WordPress Core

Opened 16 years ago

Closed 11 years ago

Last modified 11 years ago

#6562 closed defect (bug) (fixed)

Visual Editor preserves multiple sequential spaces, fouls up shortcode parsing

Reported by: markjaquith's profile markjaquith Owned by:
Milestone: 3.7 Priority: normal
Severity: normal Version: 2.5
Component: TinyMCE Keywords: commit has-patch
Focuses: Cc:

Description

The Visual Editor (TinyMCE) preserves multiple sequential spaces by turning the second (and following) spaces into some sort of funky invisible hard-space character. Unfortunately, this character is NOT matched by ' ' (space) or '\s' (whitespace) in PCRE (preg_*() functions).

Example:

Put the following into the HTML editor on a post with uploaded images.

[gallery                 orderby=             "RAND()"]

Save the post, then preview. Multiple refreshes should shuffle the images around randomly, because you're ordering by RAND().

Now switch to the visual editor, and insert the same thing, with all the superfluous spaces, and save. Preview the post -- you won't get random images, because the funky TinyMCE pseudo-space characters prevent the regex from parsing the attributes of your shortcode.

Proposed solution: turn off TinyMCE's whitespace preservation code when within square brackets.

Attachments (2)

6562.patch (566 bytes) - added by azaozz 16 years ago.
6562-ut.diff (1.1 KB) - added by mdawaffe 11 years ago.

Download all attachments as: .zip

Change History (22)

#1 @markjaquith
16 years ago

Alternative solution: str_replace() that hard space character out in shortcode_parse_atts() before parsing.

#2 @azaozz
16 years ago

  • Keywords has-patch added

There are couple of characters that are valid whitespace in HTML, but are not allowed in PCRE patterns and not matched by \s. Normalizing the white space coming from HTML seems the better alternative.

@azaozz
16 years ago

#3 @ryan
16 years ago

  • Resolution set to fixed
  • Status changed from new to closed

(In [7768]) Normalize whitespace when parsing shortcodes. Props azaozz. fixes #6562 for 2.5

#4 @ryan
16 years ago

(In [7769]) Normalize whitespace when parsing shortcodes. Props azaozz. fixes #6562 for trunk

#5 @ryan
16 years ago

  • Milestone changed from 2.6 to 2.5.1

#6 @tellyworth
16 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

I created a unit test for this and it fails. It could be an error in my test but I suspect the patch doesn't work (or at least doesn't work in some cases or some environments).

See test_utf8_whitespace() here: http://svn.automattic.com/wordpress-tests/wp-testcase/test_shortcode.php

I've tried with and without the ascii spaces in addition to utf-8 whitespace sequences. Either way the shortcode runs but the attribute array is empty. $text is correct entering shortcode_parse_atts() but blank after the preg_replace line added in the patch.

Can someone review the test please and either show me a correct input string if I have it wrong, or confirm that it fails?

#7 @azaozz
16 years ago

There's a known problem in PCRE (the library, not the php interface) with patterns containing 0x00 (\x00):
http://bugs.php.net/bug.php?id=16590

Perhaps that gets triggered when passing the string. Seems to work well when the string contains the actual UTF-8 characters.

#8 @ryan
16 years ago

Is this okay as is for 2.5.1?

#9 @tellyworth
16 years ago

azaozz, that bug may well be relevant but it says it occurs when the pattern contains 0x00, not the subject string. To see if 0x00 was messing up the test I split the case in two, test_utf8_whitespace_1() and test_utf8_whitespace_2(). The second test doesn't contain 0x00 at all. Both fail, though in different ways.

The first (with the 00a0 sequence) produces empty attributes. The second has attributes, but the whole \x20\x0babc=\"def\" string is considered a value, rather than parsed.

Can anyone confirm one way or the other? I'm still not sure if I'm testing it wrong or if the bug is still present.

#10 @tellyworth
16 years ago

Just to clarify: I don't think this patch does any harm, but I'm not convinced that it reliably fixes the bug.

#11 @azaozz
16 years ago

I don't think the sequences are generating the proper UTF characters. For example the U+00A0 would be \xC2\xA0 (http://www.fileformat.info/info/unicode/char/00a0/index.htm), or better to type the actual character in the test string.

The patch does exactly what it says: normalizes the white space characters that are valid in HTML but not in PCRE patterns.

On the other hand since this could be typed by the users, we should probably filter it further, something like:

$text = preg_replace("/[^\x20-\x7f\t\n\r]+/", "", $text); // ascii only

This will sanitize the user input and remove the non-compatible white space characters too.

#12 @ryan
16 years ago

  • Milestone changed from 2.5.2 to 2.9

Milestone 2.5.2 deleted

#13 @ryan
15 years ago

  • Component changed from Administration to TinyMCE

#14 @Denis-de-Bernardy
15 years ago

  • Keywords needs-patch added; has-patch removed

#15 @azaozz
15 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

I think this is working as expected.

#16 @nacin
12 years ago

  • Milestone changed from 2.9 to Awaiting Review
  • Resolution fixed deleted
  • Status changed from closed to reopened

I am re-opening this as the unit tests from tellyworth continue to fail. As he said above: "I don't think this patch does any harm, but I'm not convinced that it reliably fixes the bug."

We should come to a conclusion on the scope of this ticket and whether the tests are valid.

#17 @daveagp
11 years ago

I think this is fixed now due to the line

$text = preg_replace("/[\x{00a0}\x{200b}]+/u", " ", $text);

in shortcodes.php

@mdawaffe
11 years ago

#18 @mdawaffe
11 years ago

  • Keywords commit has-patch added; needs-patch removed

Core is doing things correctly. The unit tests are wrong.

attachment:6562-ut.diff fixes the tests, both of which pass with current trunk.

#19 @mdawaffe
11 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 1321/tests:

Fix encoding of whitespace characters in shortcode tests for #6562.

Fixes #6562.

#20 @ocean90
11 years ago

  • Milestone changed from Awaiting Review to 3.7
Note: See TracTickets for help on using tickets.