Make WordPress Core

Opened 9 years ago

Closed 9 years ago

#33106 closed defect (bug) (fixed)

Removal of newlines in CDATA breaks inline scripts

Reported by: jeremyfelt's profile jeremyfelt Owned by: miqrogroove's profile miqrogroove
Milestone: 4.2.4 Priority: highest omg bbq
Severity: major Version: 4.2.3
Component: Formatting Keywords: has-patch fixed-major
Focuses: Cc:

Description

As of 4.2.3, depending on how a CDATA block is used, the stripping of new lines in this content may break when displayed on the front end, even when published with unfiltered_html capabilities.

Intended content:

<script>// <![CDATA[
_my_function('data');
// ]]>
</script>

In 4.2.2, this content is left as is and _my_function() fires as expected. In 4.2.3, the content is manipulated as such:

<script>// <![CDATA[ _my_function('data'); // ]]></script>

This results in the script being commented out by the // and it will not fire.

A workaround for this is to use /* for commenting.

<script> /* <![CDATA[ */ _my_function('data'); /* ]]> */ </script>

Attachments (17)

33106-testcase.html (211 bytes) - added by miqrogroove 9 years ago.
33106-idea.patch (1.2 KB) - added by miqrogroove 9 years ago.
patch ideas.txt (1.2 KB) - added by miqrogroove 9 years ago.
Excuse the format; I wrote this in Notepad at a hotel.
33106.patch (1.3 KB) - added by miqrogroove 9 years ago.
33106.2.patch (3.5 KB) - added by miqrogroove 9 years ago.
33106.3.patch (1.3 KB) - added by azaozz 9 years ago.
Alt patch for autoembed
33106.4.patch (4.3 KB) - added by miqrogroove 9 years ago.
33106.5.patch (6.0 KB) - added by miqrogroove 9 years ago.
33106.tests.media.patch (942 bytes) - added by kitchin 9 years ago.
33106.tests.media.2.patch (926 bytes) - added by kitchin 9 years ago.
33106.tests.media.3.patch (1.7 KB) - added by kitchin 9 years ago.
Maybe too much. Third additional test to show the_content application.
33106.6.patch (5.9 KB) - added by azaozz 9 years ago.
33106.7.patch (7.8 KB) - added by miqrogroove 9 years ago.
33106.tests.media.mod-hook.4.patch (2.7 KB) - added by kitchin 9 years ago.
33106.5.mod-hook.patch (6.4 KB) - added by kitchin 9 years ago.
33106.7.mod-hook.patch (8.2 KB) - added by kitchin 9 years ago.
33106.8.patch (11.1 KB) - added by miqrogroove 9 years ago.
Tests from kitchin added with some tweaks. Fixes from azaozz still included.

Download all attachments as: .zip

Change History (71)

#1 @miqrogroove
9 years ago

Attached patch is a rough draft and does not work. Just brainstorming.

#2 @iseulde
9 years ago

Duplicate? #33099

#3 @chriscct7
9 years ago

With 33106-testcase.html

Tested in latest Chrome, FireFox, and Opera by hand, and then emulated 101 other browser/browser version/OS combos.

The ones that failed:

  • Konqueror 4.4
  • Konqueror 4.8
  • Konqueror 4.9
  • Opera 10.60
  • Opera 10.53

None of major browser/browser version/os combos failed emulation testing.

@miqrogroove
9 years ago

Excuse the format; I wrote this in Notepad at a hotel.

@miqrogroove
9 years ago

#4 @dd32
9 years ago

Seemingly related is #33099 which uses a similar format:

<script><!--
my_function();
\\ --> </script>

@miqrogroove
9 years ago

#5 follow-ups: @chriscct7
9 years ago

Do we care about nested CDATA entries like: <![CDATA[]]]]><![CDATA[>]]>

#6 in reply to: ↑ 5 @miqrogroove
9 years ago

Replying to chriscct7:

Do we care about nested CDATA entries like: <![CDATA[]]]]><![CDATA[>]]>

No problems there.

33106.2.patch should be very close to what we need now.

@azaozz
9 years ago

Alt patch for autoembed

#7 @azaozz
9 years ago

33106.3.patch is an alternative for autoembed(). Instead of removing line breaks it doesn't look inside of HTML tags (including comments and CDATA). It changes the behavior a bit and fixes #25387.

#8 @chriscct7
9 years ago

#33099 was marked as a duplicate.

@miqrogroove
9 years ago

#9 @miqrogroove
9 years ago

  • Keywords has-patch added; needs-patch removed
  • Priority changed from normal to highest omg bbq
  • Severity changed from normal to major

33106.4.patch works.

#10 @SergeyBiryukov
9 years ago

  • Component changed from General to Formatting
  • Milestone changed from Awaiting Review to 4.2.4

@miqrogroove
9 years ago

#11 @miqrogroove
9 years ago

33106.5.patch Adds a few unit tests for wpautop only. The tests/media.php file was returning nothing but false positives so I didn't add anything in there. The oEmbed system calls get_post and I wasn't sure how to initialize it or why none of the other tests are doing it.

This ticket was mentioned in Slack in #core by kraft. View the logs.


9 years ago

#13 @kitchin
9 years ago

Results with only the tests in 33106.5.patch and 33106.tests.media.patch:

phpunit tests_phpunit_tests_formatting_Autop
Tests: 14, Assertions: 19, Failures: 5.

phpunit tests_phpunit_tests_media
Tests: 32, Assertions: 82, Failures: 2.

Results with all of 33106.5.patch and 33106.tests.media.patch:

phpunit tests_phpunit_tests_formatting_Autop
OK (14 tests, 19 assertions)

phpunit tests_phpunit_tests_media
OK (32 tests, 82 assertions)

#14 @kitchin
9 years ago

Fixed, 33106.tests.media.2.patch has same results as previous.
It tests the original CDATA case in this bug and the case mention in comment:4, <!-- ... //-->.

Last edited 9 years ago by kitchin (previous) (diff)

#15 @miqrogroove
9 years ago

kitchin, I'm not sure if those autoembed tests are valid. When I add a bare URL as a baseline test, the autoembed function still does nothing in the tests. It runs the regexp but the ::shortcode() is returning input with no changes. Could you try that?

Last edited 9 years ago by miqrogroove (previous) (diff)

#16 follow-up: @kitchin
9 years ago

Appears that ->autoembed does not do the embedding, instead it canonicalizes the text for later work by the oEmbed service, and for the [embed] shortcode.

I'm attaching my patch with a third test case, using apply_filters('the_content', ....) to get a bigger picture:

  • If youtube link: any [embed] is removed from around the link.
  • If youtube link: canonicalized from http://.../embed/... to https://.../watch?v=....
  • Other links: [embed] becomes <a href=....
  • Other links without [embed]: no change.

Not sure the third test should be checked in, it's not targeted to a single issue.

All 3 tests I've written fail before your patch.5 and succeed after.

Last edited 9 years ago by kitchin (previous) (diff)

@kitchin
9 years ago

Maybe too much. Third additional test to show the_content application.

#17 @azaozz
9 years ago

The autoembed() part of the 33106.5.patch doesn't work here. It replaces embeddable URLs that are wrapped in a link. These should not be embedded. To reproduce:

  1. Paste this in the Text editor:
<a href="https://www.youtube.com/watch?v=BZL3CSUwoTc">https://www.youtube.com/watch?v=BZL3CSUwoTc</a>
  1. Preview the post. You should see a link to that video, instead it is embedded (and the <a> tag is broken).

To fix that we either need to revert to removing line breaks in autoembed() or do something like 33106.3.patch.

@azaozz
9 years ago

#18 @azaozz
9 years ago

In 33106.6.patch: revert how autoembed() works. Replace line breaks with placeholders, search for embeddable URLs on their own line, then put the line breaks back (similarly to how wpautop() works).

#19 @kitchin
9 years ago

33106.6.patch passes my three unit tests.

#20 @cooperator_JR
9 years ago

Same problem Here: https://wordpress.org/support/topic/simple-javascript-to-show-daymonth-stop-working-after-423-update-1

ORIGINAL CODE:
http://snag.gy/1nzdz.jpg

Altered CODE "automaticaly", after 4.2.3 UPDATE:
http://snag.gy/4AgUt.jpg

The 2 codes works equally on an external test environment. ( not posts or Wordpress)
Well, lets spend an overnight to remove ...

// <![CDATA[

and then this, right before the final close script tag

// ]]>

...from 175 posts that use the code...

It´s the life! :-)
Happy to know that you´re working to solve.

Last edited 9 years ago by cooperator_JR (previous) (diff)

#21 in reply to: ↑ 16 ; follow-up: @miqrogroove
9 years ago

Replying to kitchin:

Appears that ->autoembed does not do the embedding, instead it canonicalizes the text for later work by the oEmbed service, and for the [embed] shortcode.

So how do we test the oEmbed service? How do we make sure to not break this in the future?

#22 @miqrogroove
9 years ago

  • Summary changed from Automatic removal of new lines in CDATA blocks may break content with unfiltered_html to Removal of newlines in CDATA breaks inline scripts

@miqrogroove
9 years ago

#23 @miqrogroove
9 years ago

33106.7.patch expands the wp_html_split() idea, making $regex static and implementing the new function in do_shortcode().

#24 @schtief
9 years ago

Hi Wordpress-Team,

please just replace

// <![CDATA[
by
/* <![CDATA[ */

and

// ]]>
by
/* ]]> */

With this CDATA-definition also inline JS are supported:

/* <![CDATA[ */document.write('hello');/* ]]> */

As a work around you can also define a filter in the functions.php for this task, but it would be great if native WP would support "one-line" (minified) javascript:

<?php
// somewhere in the functions.php
function my_filter_cdata( $content ) {
  $content = str_replace( '// <![CDATA[', '/* <![CDATA[ */', $content );
  $content = str_replace( '// ]]>', '/* ]]> */', $content );
  return $content;
}
add_filter( 'content_save_pre', 'my_filter_cdata', 9, 1 );

Thank you

#25 in reply to: ↑ 21 @kitchin
9 years ago

Replying to miqrogroove:

Replying to kitchin:

Appears that ->autoembed does not do the embedding, instead it canonicalizes the text for later work by the oEmbed service, and for the [embed] shortcode.

So how do we test the oEmbed service? How do we make sure to not break this in the future?

Grepping I do not see a test harness for the service. Do you know if round-tripping TinyMCE is also not covered? I mean, emulating a user throwing some html into TinyMCE.

#26 @obenland
9 years ago

  • Owner set to miqrogroove
  • Status changed from new to assigned

#27 @kitchin
9 years ago

New tests coming up. These tests require a new hook in WP_Embed::shortcode() so they may not be accepted. The new tests detect the bug in patch.5 found by azaozz in comment:17.

Three patches coming up. Only use two of them.

  • 33106.tests.media.mod-hook.4.patch
    The new tests. They require one of the following two patches to trunk:
  • 33106.5.mod-hook.patch
    Same as 33106.5.patch, plus the new hook in shortcode().
    Will get one failure in:
    $ phpunit tests_phpunit_tests_media
  • 33106.7.mod-hook.patch
    Same as 33106.7.patch, plus the new hook in shortcode().
    Will get no failures in:
    $ phpunit tests_phpunit_tests_media

#28 @miqrogroove
9 years ago

Cool ideas. Thank you kitchin. Just need to add one or two tests where a CDATA block contains a URL on its own line. The idea is to make sure we are not now or in the future going to autoembed anywhere in the middle of an inline script. Then combine the tests with 33106.7.mod-hook.patch so we have it all in one patch.

@miqrogroove
9 years ago

Tests from kitchin added with some tweaks. Fixes from azaozz still included.

#29 @miqrogroove
9 years ago

Latest patch adds all the tests we came up with. While it doesn't fully process oEmbed, I am confident that it now covers relevant changes in WP_Embed::autoembed().

This ticket was mentioned in Slack in #core by jorbin. View the logs.


9 years ago

#31 in reply to: ↑ 5 @stevenkword
9 years ago

Replying to chriscct7:

Do we care about nested CDATA entries like: <![CDATA[]]]]><![CDATA[>]]>

I believe nested CDATA like this breaks RSS, but that's a little outside the point of the ticket.

This ticket was mentioned in Slack in #core by sam. View the logs.


9 years ago

#33 @markjaquith
9 years ago

#33159 was marked as a duplicate.

#34 @millertimesites
9 years ago

Looks like this CDATA problem with 4.2.3 is affecting all Google Adsense code.
If the code below was my prior code, what should I change it to in order to get it working again?

<script src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js" async=""></script><!-- 336x280 Content --> <ins class="adsbygoogle" style="display: inline; float: left; margin: 5px; width: 336px; height: 280px;" data-ad-client="ca-pub-3788205869703165" data-ad-slot="5395550042"></ins><script>// <![CDATA[
(adsbygoogle = window.adsbygoogle || []).push({});
// ]]></script>


Thanks

Last edited 9 years ago by millertimesites (previous) (diff)

#35 @miqrogroove
9 years ago

Hi @millertimesites , until the patch is installed, one of the workarounds for this bug is to replace // <![CDATA[ with /* <![CDATA[ */

#36 @millertimesites
9 years ago

I tried that by changing

<script src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js" async=""></script><!-- 336x280 Content --> <ins class="adsbygoogle" style="display: inline; float: left; margin: 5px; width: 336px; height: 280px;" data-ad-client="ca-pub-3788205869703165" data-ad-slot="5395550042"></ins><script>// <![CDATA[
(adsbygoogle = window.adsbygoogle || []).push({});
// ]]></script>

to

<script src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js" async=""></script><!-- 336x280 Content --> <ins class="adsbygoogle" style="display: inline; float: left; margin: 5px; width: 336px; height: 280px;" data-ad-client="ca-pub-3788205869703165" data-ad-slot="5395550042"></ins><script>/* <![CDATA[
(adsbygoogle = window.adsbygoogle || []).push({});
*/ ]]></script>

and it didn't fix anything.
I assume that the 1,000 of us using AdSense on WordPress should just stand pat and deal with it until whenever a patch is installed?

#37 @miqrogroove
9 years ago

Yeah that won't fix anything because that's the wrong code.

#38 @millertimesites
9 years ago

If it is the wrong code, then what should it be?
The code I was using always worked great until the 4.2.3 release and was from the
official AdSense code
The code from Google:

<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- 336x280 Content -->
<ins class="adsbygoogle"
     style="display: inline; float: left; margin: 5px; width: 336px; height: 280px;"
     data-ad-client="ca-pub-3788205869703165"
     data-ad-slot="5395550042"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>

Is automatically changed to

<script src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js" async=""></script><!-- 336x280 Content --> <ins class="adsbygoogle" style="display: inline; float: left; margin: 5px; width: 336px; height: 280px;" data-ad-client="ca-pub-3788205869703165" data-ad-slot="5395550042"></ins><script>// <![CDATA[
(adsbygoogle = window.adsbygoogle || []).push({});
// ]]></script>

by WordPress when you save the changes

Last edited 9 years ago by millertimesites (previous) (diff)

#39 @azaozz
9 years ago

@millertimesites try adding this to a plugin:

add_filter( 'the_content', '_my_fix_cdata', 20 );
function _my_fix_cdata( $content ) {
	if ( strpos( $content, '// <![CDATA[' ) !== false ) {
		$content = preg_replace( '%(<script[^>]*>)// <!\[CDATA\[(?![\r\n])%', '$1/* <![CDATA[ */', $content );
	}

	return $content;
}

#40 follow-up: @millertimesites
9 years ago

@azaozz
Thanks for the suggestion.

I don't think that is going to work as I am not using a plugin or widget for AdSense and am using the raw code provided by AdSense like most AdSense users.

The code needs to be able to work inline with the content/text of each page.
Maybe there is a solution that can be added to the theme's Quick CSS?

#41 @wonderboymusic
9 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 33469:

Protect newlines inside of CDATA. This was breaking things, notably inline JS that used comments for HTML standards compat.

  • Tokenize newlines in WP_Embed::autoembed() before running ->autoembed_callback()
  • Tokenize newlines with placeholders in wpautop()
  • Introduce wp_html_split() to DRY the RegEx from wp_replace_in_html_tags() and do_shortcodes_in_html_tags()

Adds unit tests.

Props miqrogroove, kitchin, azaozz.
Fixes #33106.

#42 @wonderboymusic
9 years ago

  • Keywords fixed-major added
  • Resolution fixed deleted
  • Status changed from closed to reopened

for 4.2.4

#43 @azaozz
9 years ago

There is no need to add a new filter in WP_Embed::shortcode() just to be able to test part of it. The embed_maybe_make_link filter can be used instead.

#44 @azaozz
9 years ago

In 33470:

Use the embed_maybe_make_link filter to test WP_Embed::autoembed().
See #33106.

#45 in reply to: ↑ 40 @SergeyBiryukov
9 years ago

Replying to millertimesites:

The code needs to be able to work inline with the content/text of each page.
Maybe there is a solution that can be added to the theme's Quick CSS?

Until 4.2.4 is out, try adding the code from comment:39 to the theme's function.php file, it should work for all posts and pages (just tested).

#46 @azaozz
9 years ago

In 33518:

Backport r33469 and r33470 to 4.2.
See #33106.

#47 @millertimesites
9 years ago

@SergeyBiryukov I added it to the function.php file of my child theme, however it did not change anything for me. None of the pages updated to make AdSense visible again.

When I put clean AdSense code back into a page I could see the ad in the backend editor, but once I clicked save it revert to a blank space again.

Jon

#48 @millertimesites
9 years ago

I actually switched to using shortcodes for each Adsense unit/style and for the first couple pages it seems to be working great. Since the code isn't handled at a page level this way it isn't broken by WordPress when you save it.
Jon

#49 @azaozz
9 years ago

In 33521:

Backport r33469 and r33470 to 4.1.
See #33106.

#50 @azaozz
9 years ago

In 33522:

Backport r33469 and r33470 to 4.0.
See #33106.

#51 @azaozz
9 years ago

In 33523:

Backport r33469 and r33470 to 3.9.
See #33106.

#52 @azaozz
9 years ago

In 33524:

Backport r33469 and r33470 to 3.8.
See #33106.

#53 @azaozz
9 years ago

In 33525:

Backport r33469 and r33470 to 3.7.
See #33106.

#54 @miqrogroove
9 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed
Note: See TracTickets for help on using tickets.