Make WordPress Core

Opened 2 months ago

Closed 4 weeks ago

#63854 closed defect (bug) (fixed)

Handle a non-string passed to WP_HTML_Tag_Processor gracefully

Reported by: kraftbj's profile kraftbj Owned by: kraftbj's profile kraftbj
Milestone: 6.9 Priority: normal
Severity: normal Version: 6.2
Component: HTML API Keywords: has-unit-tests has-patch commit
Focuses: Cc:

Description

The WP_HTML_Tag_Processor constructor assigns the passed value directly to $this->html. The method get_updated_html() is typed only to return a string, resulting in a runtime fatal if null is passed to the constructor.

I proposed adding a doing_it_wrong and setting the value to an empty string when a non-string is passed to it.

Discovered by a filter hooked onto the_content that passed the value to the processor. Earlier code must have nulled out the value or something similar.

wp> $html = new WP_HTML_Tag_Processor(null);
=> object(WP_HTML_Tag_Processor)#3418 (26) {
  ["html":protected]=>
  NULL
  ["last_query":"WP_HTML_Tag_Processor":private]=>
  NULL
  ["sought_tag_name":"WP_HTML_Tag_Processor":private]=>
  NULL
  ["sought_class_name":"WP_HTML_Tag_Processor":private]=>
  NULL
  ["sought_match_offset":"WP_HTML_Tag_Processor":private]=>
  NULL
  ["stop_on_tag_closers":"WP_HTML_Tag_Processor":private]=>
  NULL
  ["parser_state":protected]=>
  string(11) "STATE_READY"
  ["compat_mode":protected]=>
  string(14) "no-quirks-mode"
  ["parsing_namespace":"WP_HTML_Tag_Processor":private]=>
  string(4) "html"
  ["comment_type":protected]=>
  NULL
  ["text_node_classification":protected]=>
  string(15) "TEXT_IS_GENERIC"
  ["bytes_already_parsed":"WP_HTML_Tag_Processor":private]=>
  int(0)
  ["token_starts_at":"WP_HTML_Tag_Processor":private]=>
  NULL
  ["token_length":"WP_HTML_Tag_Processor":private]=>
  NULL
  ["tag_name_starts_at":"WP_HTML_Tag_Processor":private]=>
  NULL
  ["tag_name_length":"WP_HTML_Tag_Processor":private]=>
  NULL
  ["text_starts_at":"WP_HTML_Tag_Processor":private]=>
  NULL
  ["text_length":"WP_HTML_Tag_Processor":private]=>
  NULL
  ["is_closing_tag":"WP_HTML_Tag_Processor":private]=>
  NULL
  ["attributes":"WP_HTML_Tag_Processor":private]=>
  array(0) {
  }
  ["duplicate_attributes":"WP_HTML_Tag_Processor":private]=>
  NULL
  ["classname_updates":"WP_HTML_Tag_Processor":private]=>
  array(0) {
  }
  ["bookmarks":protected]=>
  array(0) {
  }
  ["lexical_updates":protected]=>
  array(0) {
  }
  ["seek_count":protected]=>
  int(0)
  ["skip_newline_at":"WP_HTML_Tag_Processor":private]=>
  NULL
}
wp> $html->get_updated_html();
Fatal error: Uncaught TypeError: WP_HTML_Tag_Processor::get_updated_html(): Return value must be of type string, null returned in /srv/htdocs/__wp__/wp-includes/html-api/class-wp-html-tag-processor.php:4146
Stack trace:
#0 phar:///usr/local/bin/wp-cli/vendor/wp-cli/shell-command/src/WP_CLI/Shell/REPL.php(46) : eval()'d code(1): WP_HTML_Tag_Processor->get_updated_html()
#1 phar:///usr/local/bin/wp-cli/vendor/wp-cli/shell-command/src/WP_CLI/Shell/REPL.php(46): eval()
#2 phar:///usr/local/bin/wp-cli/vendor/wp-cli/shell-command/src/Shell_Command.php(52): WP_CLI\Shell\REPL->start()
#3 [internal function]: Shell_Command->__invoke(Array, Array)
#4 phar:///usr/local/bin/wp-cli/vendor/wp-cli/wp-cli/php/WP_CLI/Dispatcher/CommandFactory.php(100): call_user_func(Array, Array, Array)
#5 [internal function]: WP_CLI\Dispatcher\CommandFactory::WP_CLI\Dispatcher\{closure}(Array, Array)
#6 phar:///usr/local/bin/wp-cli/vendor/wp-cli/wp-cli/php/WP_CLI/Dispatcher/Subcommand.php(497): call_user_func(Object(Closure), Array, Array)
#7 phar:///usr/local/bin/wp-cli/vendor/wp-cli/wp-cli/php/WP_CLI/Runner.php(470): WP_CLI\Dispatcher\Subcommand->invoke(Array, Array, Array)
#8 phar:///usr/local/bin/wp-cli/vendor/wp-cli/wp-cli/php/WP_CLI/Runner.php(493): WP_CLI\Runner->run_command(Array, Array)
#9 phar:///usr/local/bin/wp-cli/vendor/wp-cli/wp-cli/php/WP_CLI/Runner.php(1295): WP_CLI\Runner->run_command_and_exit()
#10 phar:///usr/local/bin/wp-cli/vendor/wp-cli/wp-cli/php/WP_CLI/Bootstrap/LaunchRunner.php(28): WP_CLI\Runner->start()
#11 phar:///usr/local/bin/wp-cli/vendor/wp-cli/wp-cli/php/bootstrap.php(84): WP_CLI\Bootstrap\LaunchRunner->process(Object(WP_CLI\Bootstrap\BootstrapState))
#12 phar:///usr/local/bin/wp-cli/vendor/wp-cli/wp-cli/php/wp-cli.php(35): WP_CLI\bootstrap()
#13 phar:///usr/local/bin/wp-cli/php/boot-phar.php(20): include('phar:///usr/loc...')
#14 /usr/local/bin/wp-cli(4): include('phar:///usr/loc...')
#15 {main}
  thrown in /srv/htdocs/__wp__/wp-includes/html-api/class-wp-html-tag-processor.php on line 4146

Change History (11)

This ticket was mentioned in PR #9545 on WordPress/wordpress-develop by @kraftbj.


2 months ago
#1

  • Keywords has-patch has-unit-tests added

Adds handling of non-string values passed to the WP_HTML_Tag_Processor by throwing a _doing_it_wrong and setting it to an empty string to avoid further issues.

#2 @westonruter
2 months ago

What is the function/method that is constructing WP_HTML_Tag_Processor which is passing null to begin with?

Personally I advocate to always start filter code with a type check since plugins can do bad things with violating the "contract" for the filtered value's type. In other words, I do the same thing as you propose, but in the filter callback:

<?php
add_filter( 'the_content', function ( $value ): string {
    if ( ! is_string( $value ) ) {
        $value = '';
    }
    // ...
    return $value;
} );

That's not to exclude what you're proposing in your PR.

Version 1, edited 2 months ago by westonruter (previous) (next) (diff)

#3 @kraftbj
2 months ago

@westonruter I agree with you ( https://github.com/Automattic/jetpack/pull/44874 to pass through the value when it's unexpected instead of trying to process it ).

I'm thinking of a change in Core since there is a strict type on the method to return a value out of it, but constructing (and interacting, at least how we are in the Jetpack CDN code) with the tag processor with an invalid construction arg occurs without complaint.

So, I agree that the filtering function should be defensive, and Core could be more defensive too.

Last edited 2 months ago by kraftbj (previous) (diff)

#4 @jonsurrell
2 months ago

The change seems like a good direction.

I think ‎WP_HTML_Processor::create_fragment() and WP_HTML_Processor::create_full_parser() are also susceptible. Those can also _doing_it_wrong(), but they're simpler in that they can return null. Do you mind including that in your PR?

@jonsurrell commented on PR #9545:


2 months ago
#5

I think ‎WP_HTML_Processor::create_fragment() and WP_HTML_Processor::create_full_parser() are also susceptible. Those can also _doing_it_wrong(), but they're simpler in that they can return null. Do you mind adding those changes here?

#6 @kraftbj
2 months ago

  • Keywords has-patch removed
  • Owner set to kraftbj
  • Status changed from new to accepted

Will do! Some personal things came up, so I'm delayed on circling back on this. I'll update the other methods and review the rest. Will re-add the has-patch keyword once it's been refreshed.

@jonsurrell commented on PR #9545:


5 weeks ago
#7

@kraftbj Just checking in, I'd love to get this in 6.9. Are you able to get back to it or would you like help to finish up?

@kraftbj commented on PR #9545:


5 weeks ago
#8

Apologies—death in the family after the last comment and it fell off my radar. I'll refresh the branch and make the updates to the other functions now.

#9 @kraftbj
5 weeks ago

  • Keywords has-patch added

#10 @jonsurrell
4 weeks ago

  • Keywords commit added

#11 @jonsurrell
4 weeks ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 60887:

HTML API: Ensure non-string HTML input is safely handled.

Prevents an issue where passing null to HTML API constructors could result in runtime errors.

Developed in https://github.com/WordPress/wordpress-develop/pull/9545.

Props kraftbj, jonsurrell, westonruter.
Fixes #63854.

Note: See TracTickets for help on using tickets.