WordPress.org

Make WordPress Core

Opened 6 years ago

Closed 9 months ago

Last modified 8 months ago

#6562 closed defect (bug) (fixed)

Visual Editor preserves multiple sequential spaces, fouls up shortcode parsing

Reported by: markjaquith Owned by:
Milestone: 3.7 Priority: normal
Severity: normal Version: 2.5
Component: TinyMCE Keywords: commit has-patch
Focuses: Cc:

Description

The Visual Editor (TinyMCE) preserves multiple sequential spaces by turning the second (and following) spaces into some sort of funky invisible hard-space character. Unfortunately, this character is NOT matched by ' ' (space) or '\s' (whitespace) in PCRE (preg_*() functions).

Example:

Put the following into the HTML editor on a post with uploaded images.

[gallery                 orderby=             "RAND()"]

Save the post, then preview. Multiple refreshes should shuffle the images around randomly, because you're ordering by RAND().

Now switch to the visual editor, and insert the same thing, with all the superfluous spaces, and save. Preview the post -- you won't get random images, because the funky TinyMCE pseudo-space characters prevent the regex from parsing the attributes of your shortcode.

Proposed solution: turn off TinyMCE's whitespace preservation code when within square brackets.

Attachments (2)

6562.patch (566 bytes) - added by azaozz 6 years ago.
6562-ut.diff (1.1 KB) - added by mdawaffe 9 months ago.

Download all attachments as: .zip

Change History (22)

comment:1 markjaquith6 years ago

Alternative solution: str_replace() that hard space character out in shortcode_parse_atts() before parsing.

comment:2 azaozz6 years ago

  • Keywords has-patch added

There are couple of characters that are valid whitespace in HTML, but are not allowed in PCRE patterns and not matched by \s. Normalizing the white space coming from HTML seems the better alternative.

azaozz6 years ago

comment:3 ryan6 years ago

  • Resolution set to fixed
  • Status changed from new to closed

(In [7768]) Normalize whitespace when parsing shortcodes. Props azaozz. fixes #6562 for 2.5

comment:4 ryan6 years ago

(In [7769]) Normalize whitespace when parsing shortcodes. Props azaozz. fixes #6562 for trunk

comment:5 ryan6 years ago

  • Milestone changed from 2.6 to 2.5.1

comment:6 tellyworth6 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

I created a unit test for this and it fails. It could be an error in my test but I suspect the patch doesn't work (or at least doesn't work in some cases or some environments).

See test_utf8_whitespace() here: http://svn.automattic.com/wordpress-tests/wp-testcase/test_shortcode.php

I've tried with and without the ascii spaces in addition to utf-8 whitespace sequences. Either way the shortcode runs but the attribute array is empty. $text is correct entering shortcode_parse_atts() but blank after the preg_replace line added in the patch.

Can someone review the test please and either show me a correct input string if I have it wrong, or confirm that it fails?

comment:7 azaozz6 years ago

There's a known problem in PCRE (the library, not the php interface) with patterns containing 0x00 (\x00):
http://bugs.php.net/bug.php?id=16590

Perhaps that gets triggered when passing the string. Seems to work well when the string contains the actual UTF-8 characters.

comment:8 ryan6 years ago

Is this okay as is for 2.5.1?

comment:9 tellyworth6 years ago

azaozz, that bug may well be relevant but it says it occurs when the pattern contains 0x00, not the subject string. To see if 0x00 was messing up the test I split the case in two, test_utf8_whitespace_1() and test_utf8_whitespace_2(). The second test doesn't contain 0x00 at all. Both fail, though in different ways.

The first (with the 00a0 sequence) produces empty attributes. The second has attributes, but the whole \x20\x0babc=\"def\" string is considered a value, rather than parsed.

Can anyone confirm one way or the other? I'm still not sure if I'm testing it wrong or if the bug is still present.

comment:10 tellyworth6 years ago

Just to clarify: I don't think this patch does any harm, but I'm not convinced that it reliably fixes the bug.

comment:11 azaozz6 years ago

I don't think the sequences are generating the proper UTF characters. For example the U+00A0 would be \xC2\xA0 (http://www.fileformat.info/info/unicode/char/00a0/index.htm), or better to type the actual character in the test string.

The patch does exactly what it says: normalizes the white space characters that are valid in HTML but not in PCRE patterns.

On the other hand since this could be typed by the users, we should probably filter it further, something like:

$text = preg_replace("/[^\x20-\x7f\t\n\r]+/", "", $text); // ascii only

This will sanitize the user input and remove the non-compatible white space characters too.

comment:12 ryan6 years ago

  • Milestone changed from 2.5.2 to 2.9

Milestone 2.5.2 deleted

comment:13 ryan5 years ago

  • Component changed from Administration to TinyMCE

comment:14 Denis-de-Bernardy5 years ago

  • Keywords needs-patch added; has-patch removed

comment:15 azaozz5 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

I think this is working as expected.

comment:16 nacin2 years ago

  • Milestone changed from 2.9 to Awaiting Review
  • Resolution fixed deleted
  • Status changed from closed to reopened

I am re-opening this as the unit tests from tellyworth continue to fail. As he said above: "I don't think this patch does any harm, but I'm not convinced that it reliably fixes the bug."

We should come to a conclusion on the scope of this ticket and whether the tests are valid.

comment:17 daveagp9 months ago

I think this is fixed now due to the line

$text = preg_replace("/[\x{00a0}\x{200b}]+/u", " ", $text);

in shortcodes.php

mdawaffe9 months ago

comment:18 mdawaffe9 months ago

  • Keywords commit has-patch added; needs-patch removed

Core is doing things correctly. The unit tests are wrong.

attachment:6562-ut.diff fixes the tests, both of which pass with current trunk.

comment:19 mdawaffe9 months ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 1321/tests:

Fix encoding of whitespace characters in shortcode tests for #6562.

Fixes #6562.

comment:20 ocean908 months ago

  • Milestone changed from Awaiting Review to 3.7
Note: See TracTickets for help on using tickets.