Make WordPress Core

Opened 4 weeks ago

Closed 7 days ago

Last modified 7 days ago

#64609 closed defect (bug) (fixed)

HTML API: set_modifiable_text() ignores leading newlines in TEXTAREA

Reported by: jonsurrell's profile jonsurrell Owned by: jonsurrell's profile jonsurrell
Milestone: 7.0 Priority: normal
Severity: minor Version: 6.7
Component: HTML API Keywords: has-patch has-unit-tests
Focuses: Cc:

Description (last modified by jonsurrell)

The ::set_modifiable_text() method on WP_HTML_Tag_Processor and WP_HTML_Processor will fail to include a leading newline in the provided plaintext content on the first text of TEXTAREA elements.

This is due to special rules for TEXTAREA elements that cause a single leading newline to be ignored immediately following the open tag.

<?php
$html_processor = WP_HTML_Processor::create_fragment('<textarea></textarea>');
$html_processor->next_token();
$html_processor->set_modifiable_text( "\nAFTER NEWLINE" );
var_dump( $html_processor->get_modifiable_text() );
// string(13) "AFTER NEWLINE"
echo $html_processor->get_updated_html();
/* Prints:
<textarea>
AFTER NEWLINE</textarea>
*/

Note that the newline is present in the updated HTML, however the HTML parsing rules cause the leading newline to be ignored.

When rendered in the browser, this TEXTAREA element will report its .textContent as AFTER NEWLINE (with no leading newline) and the rendered element shows AFTER NEWLINE on the first line of the input box.

In order for the TEXTAREA to begin with a newline in its content, an additional newline must be included.

This is closely related to #64607.

Change History (17)

This ticket was mentioned in PR #10879 on WordPress/wordpress-develop by @jonsurrell.


4 weeks ago
#1

  • Keywords has-patch has-unit-tests added

The ::set_modifiable_text() method on WP_HTML_Tag_Processor and WP_HTML_Processor will fail to include a leading newline in the provided plaintext content on the first text of TEXTAREA, PRE, and LISTING elements.

This is due to special rules for these elements that cause a single leading newline to be ignored immediately following the open tag.

$html_processor = WP_HTML_Processor::create_fragment('<textarea></textarea>');
$html_processor->next_token();
$html_processor->set_modifiable_text( "\nAFTER NEWLINE" );
var_dump( $html_processor->get_modifiable_text() );
// string(13) "AFTER NEWLINE"
echo $html_processor->get_updated_html();
/* Prints:
<textarea>
AFTER NEWLINE</textarea>
*/

Note that the newline is present in the updated HTML, however the HTML parsing rules cause the leading newline to be ignored.

When rendered in the browser, this TEXTAREA element will report its .textContent as AFTER NEWLINE (with no leading newline) and the rendered element shows AFTER NEWLINE on the first line of the input box.

In order for the TEXTAREA to begin with a newline in its content, an additional newline must be included.

This is closely related to #64607.

Trac ticket: https://core.trac.wordpress.org/ticket/64609

@jonsurrell commented on PR #10879:


2 weeks ago
#2

the dependence on $this->skip_newline_at means that the last-parsed PRE _must_ be the one immediately before the text node with set_modifiable_text().
… this isn’t _currently_ a problem because all seeks move forward from the start, but that’s not spec — that’s just a coincidence with the current implementation.

Very good point, this is a difficult problem.

What do you think of landing just the TEXTAREA part of this fix, which seems clear, and leaving PRE/LISTING for now?

This ticket was mentioned in PR #11062 on WordPress/wordpress-develop by @jonsurrell.


11 days ago
#3

Detect cases where ::set_modifiable_text() would omit a leading newline from its input and adjust accordingly. This is done by adding an additional leading newline (that is ignored by HTML parsers) in case a leading newline is detected in the input.

This follows the HTML parsing rules for TEXTAREA elements that ignores a single U+000A LINE FEED character immediately following the open tag. It also respects the guidelines on newline normalization, so a U+000D CARRIAGE RETURN also triggers the extra newline.

This PR *only handles TEXTAREA*, where the ::set_modifiable_text() behavior is simpler. See https://github.com/WordPress/wordpress-develop/pull/10879#pullrequestreview-3833676403.

This is a partial fix for the ticket and is extracted from https://github.com/WordPress/wordpress-develop/pull/10879.

The ::set_modifiable_text() method on WP_HTML_Tag_Processor and WP_HTML_Processor will fail to include a leading newline in the provided plaintext content on the first text of TEXTAREA, PRE, and LISTING elements.

This is due to special rules for these elements that cause a single leading newline to be ignored immediately following the open tag.

$html_processor = WP_HTML_Processor::create_fragment('<textarea></textarea>');
$html_processor->next_token();
$html_processor->set_modifiable_text( "\nAFTER NEWLINE" );
var_dump( $html_processor->get_modifiable_text() );
// string(13) "AFTER NEWLINE"
echo $html_processor->get_updated_html();
/* Prints:
<textarea>
AFTER NEWLINE</textarea>
*/

Note that the newline is present in the updated HTML, however the HTML parsing rules cause the leading newline to be ignored.

When rendered in the browser, this TEXTAREA element will report its .textContent as AFTER NEWLINE (with no leading newline) and the rendered element shows AFTER NEWLINE on the first line of the input box.

In order for the TEXTAREA to begin with a newline in its content, an additional newline must be included.

This is closely related to #64607.

@jonsurrell commented on PR #10879:


11 days ago
#4

I extracted just the TEXTAREA handling to https://github.com/WordPress/wordpress-develop/pull/11062 which has a clear path forward.

#5 @ozgursar
11 days ago

Patch Testing Report

Patch Tested: https://github.com/WordPress/wordpress-develop/pull/11062

Environment

  • WordPress: 7.0-beta1-61709-src
  • PHP: 8.2.29
  • Server: nginx/1.29.4
  • Database: mysqli (Server: 8.4.7 / Client: mysqlnd 8.2.29)
  • Browser: Chrome 145.0.0.0
  • OS: macOS
  • Theme: Twenty Twenty-Five 1.4
  • MU Plugins: None activated
  • Plugins:
    • Code Snippets 3.9.5
    • Test Reports 1.2.1

Steps taken

  1. Add the following test snippet via functions.php or Code Snippets plugin to create a shortcode
add_shortcode( 'patch_test_64609', function() {
    $out = '';

    // textarea via WP_HTML_Processor
    $p = WP_HTML_Processor::create_fragment( '<textarea></textarea>' );
    $p->next_token();
    $p->set_modifiable_text( "\nAFTER NEWLINE" );
    $out .= $p->get_updated_html();

    // pre and listing via WP_HTML_Tag_Processor directly
    foreach ( [ 'pre', 'listing' ] as $tag ) {
        $p = new WP_HTML_Tag_Processor( "<$tag>existing text</$tag>" );
        $p->next_tag( $tag );
        $p->next_token();
        $p->set_modifiable_text( "\nAFTER NEWLINE" );
        $out .= $p->get_updated_html();
    }

    return $out;
} );
  1. Add the shortcode [patch_test_64609] to any page/post and view the generated HTML
  2. ✅ Patch is solving the problem

Expected result

  • For the TEXTAREA element \n is working as expected.
  • For PRE and LISTING elements there is no change (as also expected by comment 4)

Screenshots/Screencast with results

Before
https://i.imgur.com/bkkCAm3.png

After
https://i.imgur.com/fQcNNQh.png

Last edited 11 days ago by jonsurrell (previous) (diff)

@dmsnell commented on PR #11062:


10 days ago
#6

Technically I think we should only do this if the parsing namespace is HTML, but part of me thinks there are probably a lot of places we are missing that check. You could add it, that would be worthwhile. Otherwise, I think it’s okay to merge as-is.

@jonsurrell commented on PR #11062:


10 days ago
#7

Technically I think we should only do this if the parsing namespace is HTML, but part of me thinks there are probably a lot of places we are missing that check. You could add it, that would be worthwhile. Otherwise, I think it’s okay to merge as-is.

That _is_ correct, <svg><textarea> is a TEXTAREA element in the SVG namespace. Because it's not in the HTML namespace, it has none of the special TEXTAREA behavior and should not be treated as a special atomic element. To set modifiable text in that case requires a text node.

That's another small bug, ::set_modifiable_text() should check for the HTML namespace for the special atomic elements.

@jonsurrell commented on PR #11062:


10 days ago
#9

https://core.trac.wordpress.org/ticket/64751 tracks the atomic element namespace issue.

#10 @jonsurrell
10 days ago

In 61754:

HTML API: Preserve ::set_modifiable_text() TEXTAREA leading newlines.

HTML specifies that a single newline is ignored at the start of a TEXTAREA. If ::set_modifiable_text() is called with a leading newline, ensure it is preserved in the resulting HTML.

Developed in https://github.com/WordPress/wordpress-develop/pull/11062.

Props jonsurrell, dmsnell.
See #64609.

@jonsurrell commented on PR #11062:


10 days ago
#11

Merged in r61754.

#12 @jonsurrell
10 days ago

@ozgursar I appreciate your testing and I apologize for not correctly giving props in [61754]!

I have corrected this with the props tool, so you should be credited for it.

#13 @ozgursar
10 days ago

@jonsurrell thank you

@jonsurrell commented on PR #11062:


10 days ago
#14

https://github.com/WordPress/wordpress-develop/pull/11083 addresses the foreign content atomic element modifiable text behavior.

@dmsnell commented on PR #10879:


10 days ago
#15

There's a relevant comment

well how about that

#16 @jonsurrell
7 days ago

  • Description modified (diff)
  • Resolution set to fixed
  • Status changed from assigned to closed
  • Summary changed from HTML API: set_modifiable_text() ignores leading newlines in PRE, LISTING, TEXTAREA to HTML API: set_modifiable_text() ignores leading newlines in TEXTAREA

I've created #64776 to track the issue for PRE and LISTING. I don't expect those to be addressed in 7.0.

This ticket now focuses specifically on TEXTAREA.

@jonsurrell commented on PR #10879:


7 days ago
#17

The main issue here is determining whether the current position is a text node that is the first child of PRE (or LISTING). In that position, it's likely possible to always add a newline.

The tag processor has very limited information in that regard. No stack of elements and no awareness of siblings.

The HTML processor has a stack of open elements to inspect, but it's still difficult to determine whether it's the first text node child or not: <pre>find-me<hr>not-me</pre>.

Note: See TracTickets for help on using tickets.