Make WordPress Core

Opened 10 months ago

Closed 3 months ago

Last modified 3 months ago

#62797 closed defect (bug) (fixed)

wp_add_inline_script does not properly escape '<!-- <script>' in contents

Reported by: artpi's profile artpi Owned by: jonsurrell's profile jonsurrell
Milestone: 6.9 Priority: normal
Severity: normal Version: 5.0
Component: Editor Keywords: has-patch has-unit-tests dev-feedback
Focuses: administration Cc:

Description

I am adding this issue in the "Editor" because this is how it's easiest to reproduce and how I stumbled onto it, but it is possible to replace the issue outside of editor context.

Reproduction path

  1. Start writing a post in Gutenberg
  2. Enter code editor
  3. Paste following code

<p><!-- <script> </p>

   </p>

  1. Save
  2. Exit code editor into block editor
  3. Reload the page
  4. Observe the editor crash and burn

Here is video of a reproduction on a fresh site:

https://screen.studio/share/ELc9d6zJ

What is going on?

block_editor_rest_api_preload is using wp_add_inline_script to preload the currently edited block content into the page
https://github.com/WordPress/WordPress/blob/0086f4ba4080285c90c08854a5d085c76b6d5109/wp-includes/block-editor.php#L767

Because the <script> tag is hidden in an comment that has not been properly terminated, it is impossible to get out of the script tag now, and nothing gets rendered afterwards.

Tested up to WordPress version 6.7.1

Attachments (2)

testing-without-patch.mp4 (1.2 MB) - added by shanemuir 10 months ago.
Testing without patch applied.
testing-with-patch.mp4 (3.6 MB) - added by shanemuir 10 months ago.
Testing with patch applied.

Change History (29)

#1 @ankitkumarshah
10 months ago

Reproduction Report

Description

This report validates whether the issue can be reproduced.

Environment

  • WordPress: 6.8-alpha-59593
  • PHP: 8.1.29
  • Server: nginx/1.16.0
  • Database: mysqli (Server: 8.0.16 / Client: mysqlnd 8.1.29)
  • Browser: Chrome 131.0.0.0
  • OS: macOS
  • Theme: Twenty Seventeen 3.8
  • MU Plugins: None activated
  • Plugins:
    • Gutenslider — The last WordPress slider you will ever need. 6.1.0
    • Test Plugin
    • Test Reports 1.2.0
    • WordPress Beta Tester 3.6.2

Actual Results

✅ Error condition occurs (reproduced).
Hi @artpi,

Thank you for bringing this up. By following the provided steps, I reproduced the issue successfully.

Screencast

https://ppgtp1rtd3.ufs.sh/f/TnWMEUzoUd85Fb83cfTuejSJhDdioPMH6NAwRtYmfvQBT3W9

#2 @abcd95
10 months ago

Hey @artpi, Thanks for bringing this up.

I was able to reproduce the issue - Screencast

The problem occurs because wp_add_inline_script() injects the block content inside a <script> tag for preloading. When there's an unclosed HTML comment containing <script> in the content, it breaks out of the string context since browsers parse this as the start of a new script tag, causing the editor to crash.

I will work on this to find a fix.

Additional Note: This happens with all variants of the script like <!-- <SCRIPT>

This ticket was mentioned in PR #8106 on WordPress/wordpress-develop by @dilipbheda.


10 months ago
#3

  • Keywords has-patch added

#4 @dilipbheda
10 months ago

  • Focuses administration added
  • Keywords needs-testing added

@artpi, thank you for the report. I've addressed it with the attached PR.

@abcd95 @ankitkumarshah, could you please review the attached PR and let me know if you find any other issues?

Thanks!

#5 @sainathpoojary
10 months ago

Hey @dilipbheda,

I have tested the PR. While trying to reproduce the issue, I encountered this warning. It might be related to a regex issue in this line

https://rioudcpuyg.ufs.sh/f/PL8E4NiPUWyO9PFMNgfEFACY0dej4LmhwrRp2qTtuWnBUoas

Last edited 10 months ago by sainathpoojary (previous) (diff)

#7 @jonsurrell
10 months ago

A fix for all cases from the wp_add_inline_script side will be difficult. It may be something the HTML API could handle.

The problem in this case is with the JSON encoding. Use of the correct flags should fix the problem, namely JSON_HEX_TAG.

A good example to follow is this code used by script modules.

This ticket was mentioned in PR #8145 on WordPress/wordpress-develop by @jonsurrell.


10 months ago
#8

Fixes #62797 by correctly escaping JSON included inside JavaScript in a script tag.

wp_add_inline_script will correctly escape closing script tags </script> which prevents script contents from breaking out of the script, it does not handle cases where some content may enter the HTML script data double escaped state and prevent the real script closing tag from closing the script tag as expected, leading to a broken document where the script tag remains open.

This can be tested by applying the patch and following the reproduction steps in #62797.

Trac ticket: https://core.trac.wordpress.org/ticket/62797

@shanemuir
10 months ago

Testing without patch applied.

@shanemuir
10 months ago

Testing with patch applied.

#9 @shanemuir
10 months ago

  • Keywords needs-testing removed

Test Report

Description

This report validates whether the indicated patch works as expected.

Patch tested: https://patch-diff.githubusercontent.com/raw/WordPress/wordpress-develop/pull/8145.diff

Environment

  • WordPress: 6.8-alpha-59274-src
  • PHP: 8.3.15
  • Server: Apache/2.4.62 (Unix) OpenSSL/3.4.0 PHP/8.3.15
  • Database: mysqli (Server: 8.0.39 / Client: mysqlnd 8.3.15)
  • Browser: Chrome 132.0.0.0
  • OS: macOS
  • Theme: Twenty Twenty-Five 1.0
  • MU Plugins: None activated
  • Plugins:
    • Test Reports 1.2.0

Actual Results

  1. ✅ Issue resolved with patch.

Additional Notes

Only testing using @jonsurrell patch, let me know if we need to test the other patch too?

Supplemental Artifacts

Added videos of testing with and without patch applied.

This ticket was mentioned in PR #8856 on WordPress/wordpress-develop by @jonsurrell.


6 months ago
#10

  • Keywords has-unit-tests added
  • Fix json encode that may break enclosing script tag
  • Update or fix tests
  • Try encoding '<' in inline script tags

Trac ticket:

@jonsurrell commented on PR #8856:


6 months ago
#11

This won't work, SCRIPT tags have special parsing rules and HTML entities are not escaped.

@jonsurrell commented on PR #8106:


6 months ago
#12

Will you explain the intention of this change and how it addresses the issue?

@jonsurrell commented on PR #8145:


5 months ago
#13

I plan to add tests to this using assertEqualMarkup when it lands in https://github.com/WordPress/wordpress-develop/pull/8882.

@jonsurrell commented on PR #8145:


5 months ago
#14

I'm concerned this pattern is very common. I wonder if there should be some dedicated functions.

Here are some examples of this exact pattern in the site editor that are susceptible to the same problem:

https://github.com/WordPress/wordpress-develop/blob/f83e66389dbe3164ccf0fb2a1132064fb161b8e4/src/wp-admin/site-editor.php#L254-L292

@dmsnell commented on PR #8145:


5 months ago
#15

I'm concerned this pattern is very common. I wonder if there should be some dedicated functions.

What’s your concern? that it will contain </script> tag closers?

@jonsurrell commented on PR #8145:


5 months ago
#16

What’s your concern? that it will contain </script> tag closers?

That's the problem here that may produce a broken page. I'm unaware of others, but wp_add_inline_script + wp_json_encode seem common to use together but risky with the default arguments.

@jonsurrell commented on PR #8145:


5 months ago
#17

I've pushed a test leveraging assertEqualHTML with the fix reverted. That test will fail on CI, then I'll push the fix again.

The failure appears as Error: Paused at incomplete token. because the <script> tag remains unclosed.

@jonsurrell commented on PR #8145:


5 months ago
#18

Example failures on CI can be observed here:

1) Tests_Blocks_Editor::test_preload_closes_script_tag
Error: Paused at incomplete token.

(I'm updating the test name after the CI run, but the result is the same).

@dmsnell commented on PR #8145:


5 months ago
#19

That's the problem here that may produce a broken page. I'm unaware of others, but wp_add_inline_script + wp_json_encode seem common to use together but risky with the default arguments.

Could we perhaps create a new Issue or document where we can start enumerating the situations in which this can go bad? perhaps a new test suite with no new code, in a PR.

I find that it’s hard for me to grasp unless it’s a particularly good day and I have my full concentration.

There are differences, aren’t there, between printing these three situations?

<script type="application/json">
{ "some": "json" }
</script>
<script>
const data = { "some": "json" };
</script>
<script>
callMeMaybe( { "some": "json" } );
</script>

I thought these were distinct cases with different edges but I can’t remember how.

@jonsurrell commented on PR #8145:


4 months ago
#20

Could we perhaps create a new Issue or document where we can start enumerating the situations in which this can go bad? perhaps a new test suite with no new code, in a PR.

Will you describe the goal of that? Is this to find some a more general fix or explore alternatives?

There are differences, aren’t there, between printing these three situations?

I don't think there are differences today.

I did a deep dive and there used to be a difference. It's the JavaScript is a superset of JSON proposal that was addressed in es2019. Platform support seems ubiquitous.

Note that this is only relevant when using `JSON_UNESCAPED_UNICODE` (_and_ `JSON_UNESCAPED_LINE_TERMINATORS` in PHP>=7.1), otherwise the relevant characters are always escaped.

<details>

<summary>More details and demo</summary>

In short, there were situations where serializing JSON (to read as JSON) and serializing JSON (inside of a JavaScript context) needed to be treated slightly differently.

In browsers without es2019 _JavaScript is a superset of JSON_ support (I tried Chrome 65, the latest Chrome without support) and if there are strings with U+2028 or U+2029, JSON allows those characters in the string, but JavaScript does not.

This demo shows the differences (only in older browsers, e.g. Chrome <= 65).

This code will error with SyntaxError: Invalid or unexpected token because of the unescaped U+2028 and U+2029 that were invalid in JavaScript strings prior to es2019:

<!DOCTYPE html>
<body>
<script defer>
const x = <?php echo json_encode(
        [ 'test' => "x\u{2028}\u{2029}y\n" ],
        JSON_UNESCAPED_SLASHES | JSON_HEX_TAG | JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_LINE_TERMINATORS
); ?>;
document.body.textContent = [...x.test].map(c => c.toString(16) ).join('');
</script>

This, however, is fine because the characters are not evaluated as JavaScript. They're read as JSON text, where they've always been allowed:

<!DOCTYPE html>
<body>
<script type="application/json" id="x">
<?php echo json_encode(
        [ 'test' => "x\u{2028}\u{2029}y\n" ],
        JSON_UNESCAPED_SLASHES | JSON_HEX_TAG | JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_LINE_TERMINATORS
); ?>
</script>
<script defer>
const x = JSON.parse( document.getElementById( 'x' ).textContent );
document.body.textContent = [...x.test].map(c => c.toString(16) ).join('');
</script>

Again, in today's JavaScript all of these cases are fine. And this only applies when certain JSON flags are applied that are not used here.

</details>

In all the situations you list we need have the same requirements of the JSON:

  • JSON must be valid for interpretation in JavaScript.
  • JSON must not close the script tag.
  • JSON must not modify the script escaping state (avoid entering double escaped script state) so that the </script> close tag closes the script as intended.

I believe the JSON_HEX_TAG | JSON_UNESCAPED_SLASHES flags proposed here cover all of these. For HTML parsing, when a <script> tag is opened, the parser transitions into script data state. The only ways to exit that state are with a U+003C LESS-THAN SIGN (<) or end-of-file. Escaping < (and >, although this is not necessary) with `JSON_HEX_TAG` should be sufficient to prevent the script tag from being closed or from transitioning into a script data escape state.

From the JavaScript/JSON perspective, the emitted JSON should be valid, the question of line terminators is not a concern today and the relevant flags are not proposed in this PR.

#21 @jonsurrell
4 months ago

  • Keywords dev-feedback added
  • Milestone changed from Awaiting Review to 6.9

@dmsnell commented on PR #8145:


4 months ago
#22

This is really helpful, @sirreal — thank you for verifying with the older browser.

#23 @jonsurrell
3 months ago

  • Owner set to jonsurrell
  • Status changed from new to accepted

This is a complex problem that I investigated and wrote about in detail here for folks that are interested in understanding exactly what's happening and the background.

@jonsurrell commented on PR #8145:


3 months ago
#24

There's one additional wrinkle I considered: XHTML. None of the XHTML-related behavior changes in this PR.

---

XHTML is likely not used today and it would be good to remove support (https://core.trac.wordpress.org/ticket/59883).

The good news is that themes that don't declare HTML5 script support (potentially "supporting XHTML") should work correctly with the encoding as proposed thanks to the CDATA wrappers used.

Also note that this does not change with this PR as the escaping of & remains unchanged. Consider the string <>& (where the characters < and & appear to be invalid):

Invalid XHTML (< and & both problematic):

<?php header('Content-Type: application/xhtml+xml; charset=UTF-8'); ?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
        <body>
                <script>
                        alert('Invalid: <>&');
                </script>
        </body>
</html>

The following would be valid with the < and & JSON hex escaped. This would require `JSON_HEX_AMP`. This is _not_ required for HTML5, only for XHTML support without CDATA:

<script>
        alert('escaped: \x3C>\x26');
</script>

WordPress will wrap inline script tag contents with CDATA if HTML5 support is not declared. That looks like this (with the proposed <> hex escape). This is _also valid XHTML_ and the escaping is correct:

<script>
        /* <![CDATA[ */
        alert('CDATA escaped: \u003C\u003E&');
        /* ]]> */
</script>

@jonsurrell commented on PR #8145:


3 months ago
#25

I'd like to move ahead with this PR, I believe it's the correct behavior and this same change should apply to many other places where data is JSON encoded for use in script tags throughout Core.

#26 @jonsurrell
3 months ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 60648:

Editor: Ensure preloading middleware JSON is correctly encoded.

Adds the appropriate JSON flags to wp_json_encode() to safely encode data for use in script tags.

Developed in https://github.com/WordPress/wordpress-develop/pull/8145.

Props jonsurrell, bernhard-reiter, dmsnell, artpi, ankitkumarshah, abcd95, dilipbheda, sainathpoojary, shanemuir.
Fixes #62797.

#27 @jonsurrell
3 months ago

I've created #63851 to audit JSON encoding generally as a follow-up.

Note: See TracTickets for help on using tickets.