Make WordPress Core

Opened 6 months ago

Closed 4 months ago

Last modified 4 months ago

#63891 closed defect (bug) (fixed)

Processing directives function in interactivity API class may cause fatal error

Reported by: hugosolar's profile hugosolar Owned by: luisherranz's profile luisherranz
Milestone: 6.9 Priority: normal
Severity: normal Version: 6.5
Component: Interactivity API Keywords: has-patch has-unit-tests commit
Focuses: php-compatibility Cc:

Description

This may be an edge case for posts created with Classic editor or classic editor block.
_process_directives function within WP_Interactivity API class will process html and check for closer tags.
if there's a wrong tag like </br> (in my case) will result in a TypeError:

Fatal error: Uncaught TypeError: count(): Argument #1 ($value) must be of type Countable|array, null given in wp-includes/interactivity-api/class-wp-interactivity-api.php on line 442

by checking the line of code, count() function is applied to get_attribute_names_with_prefix() method from WP_HTML_Tag_Processor which can eventually return null

for newer PHP version (8.0.0 and later) this will throw a fatal error.
probably worth getting an extra check if the funcion is returning a countable variable

Change History (21)

This ticket was mentioned in PR #9641 on WordPress/wordpress-develop by @hugosolar.


6 months ago
#1

  • Keywords has-patch added; needs-patch removed

This may be an edge case for posts created with Classic editor or classic editor block.
_process_directives function within WP_Interactivity_API class will process html and check for closer tags.
if there's a wrong tag like </br> (in my case) will result in a TypeError:

Fatal error: Uncaught TypeError: count(): Argument #1 ($value) must be of type Countable|array, null given in wp-includes/interactivity-api/class-wp-interactivity-api.php on line 442

by checking the line of code, count() function is applied to get_attribute_names_with_prefix() method from WP_HTML_Tag_Processor which can eventually return null

for newer PHP version (8.0.0 and later) this will throw a fatal error.
probably worth getting an extra check if the funcion is returning a countable variable

This ticket was mentioned in PR #9657 on WordPress/wordpress-develop by @jonsurrell.


6 months ago
#2

  • Keywords has-unit-tests added

</br> gets special treatment in the HTML standard. It creates a BR element with no attributes (regardless of whether it _appears_ to have attributes in the HTML like </br attr-name>).

  • Stop on </br> in ::next_tag() like other openers.
  • Return array() from ::get_attribute_names_with_prefix() on </br> instead of null like true close tags.
  • Add tests.

Trac ticket: https://core.trac.wordpress.org/ticket/63891

#3 @jonsurrell
6 months ago

Checking for null as proposed in PR 9641 is a good idea. I have some other thoughts as this is related to HTML API behavior.


The behavior may be exceptional with </br>, I don't believe there are other cases where the tag processor will report an open tag and not return an array of attributes.

This stems from how </br> is a bit unusual. It appears to be a close tag, but produces a BR element with no attributes (</br class="c"> does not have a class attribute).

This is handled in WP_HTML_Tag_Processor::is_tag_closer(), but :: get_attribute_names_with_prefix() checks is_closing_tag directly.

<?php
public function get_attribute_names_with_prefix( $prefix ): ?array {
        if (
                self::STATE_MATCHED_TAG !== $this->parser_state ||
                $this->is_closing_tag
        ) {
                return null;
        }

        $comparable = strtolower( $prefix );

        $matches = array();
        foreach ( array_keys( $this->attributes ) as $attr_name ) {
                if ( str_starts_with( $attr_name, $comparable ) ) {
                        $matches[] = $attr_name;
                }
        }
        return $matches;
}

The behavior around </br> is somewhat inconsistent in that regard.

::get_attribute_names_with_prefix() could return an empty array in this case, making it consistent with other open tags. Swapping $this->is_closing_tag for $this->is_tag_closer() in the snippet above seems sufficient in that regard.

Other methods like ::set_attribute() continue to check $this->is_closing_tag and always return false. HTML modification methods would require more involved HTML changes (eliminate the closing / from </br>), the type of modification that the HTML API has avoided so far although it doesn't seem unreasonable.

I've proposed PR 9657 to address some of these issues.

#4 @jonsurrell
6 months ago

  • Version changed from 6.8.2 to 6.5

#5 @dmsnell
6 months ago

Here are a couple of notes about the history of this usage, as the issue at hand lies with the Interactivity API’s use of the HTML API, not with the defined and commented and annotated behaviors of the HTML API.

::get_attribute_names_with_prefix() may return null, so it’s always the case that we need to assume it may do that and write our code accordingly. it essentially indicates the difference between this has no attributes and this cannot have any attributes in any situation. So for opening tags we find an empty set of arrays, but HTML comments, text nodes, etc…, which cannot have attributes — those return null. That nuance is not well documented in the function interface, and we could also go about updating it to be clearer. Regardless, null is still a valid return type.

The issue with </br> is identical to with an unexpected </p>. These tags cannot contain HTML attributes and so return null. We discussed the role of the Tag Processor in these cases of whether they should report as closing tags or as opening tags and we decided at the time to try and keep the Tag Processor focused as a lexer and the HTML Processor as the machine of imbuing semantic interpretation to the tokens (@zieladam was part of these discussions and I can’t remember his thoughts on the matter).

It’s probably fine/safe to consider </br> an opening tag because it’s atomic, but if we do that it does then raise concerns about inconsistency with </p> because _sometimes_ the </p> is an opening tag and _sometimes_ it’s a closing tag, but the Tag Processor lacks the knowledge to make that decision. In the same way there are a plethora of cases where the low-level tokenization remains a bit less expected because the nuance is if you want to know semantic information about the HTML you need to use the HTML Processor. Here are some other examples:

  • in the Tag Processor, <image> is an opening tag whose name is IMAGE. In the HTML Processor it’s a void element whose name is IMG, unless it’s in the SVG or MathML context and then it’s still an IMAGE. get_tag() reports the name difference.
  • in the Tag Processor, </sarcasm> is a closing tag whose name is SARCASM. In the HTML Processor it’s non-existent; completely skipped-over.
  • in the Tag Processor, </p><p> contains a closing P tag and then an opening P tag. In the HTML Processor it contains an opening P tag, a closing P tag, a second opening P tag, and a second closing P tag, in that order. the Tag Processor has no way of knowing whether </p> should open or close a P element.

</br> is one of very few exceptions I would feel comfortable incorporating into the HTML API, but I am also very reluctant to move some semantic exceptions into the Tag Processor when we know that someone needs to rely on the HTML Processor if they want that. we might already make some exceptions like this and this may all seem duplicitous, but I want to stress that just because something is unexpected or because very-low-level primitives are severely delegated between the two classes, that does not imply we necessarily have to change the interface.


My recommendation is that code calling get_attribute_names_with_prefix() which does not check for null simply do this for convenience.

<?php
$count = count( $processor->get_attribute_names_with_prefix() ?? array() );

We can step back and note that this reported problem does not manifest when using the HTML Processor.

#6 @jonsurrell
6 months ago

To clarify, special </br> handling already exists in the Tag Processor, which is why this is exceptional. I believe it's the only case where you can do this:

<?php
$p = new WP_HTML_Tag_Processor('</br>');
$p->next_tag( array( 'tag_closers' => 'visit' ) );
var_dump($p->get_tag());
if ( ! $p->is_tag_closer() ) {
  assert( null !== $p->get_attribute_names_with_prefix('') ); // Throws with </br>, not with </p>
}

It seems inconsistent to

  • Not stop on the </br> if not visiting closers
  • On the </br> return false from ::is_tag_closer().
  • Return null after reporting that it's a tag opener.

The appropriate way to handle this and what it means to be consistent is debatable.


The HTML Processor uses "virtual tokens" to address this for things like </p> where a P element may be pushed to the stack and then popped. In that case, there's a "virtual" <p> followed by a "real" </p> and everything works out. </br> is exceptional for a few reasons:

  • It's not virtual, that token really behaves like a tag opener.
  • It's propagated all the way from the "before html" insertion mode.

An end tag whose tag name is "br"
Drop the attributes from the token, and act as described in the next entry; i.e. act as if this was a "br" start tag token with no attributes, rather than the end tag token that it actually is.


What's absolutely clear is that get_attribute_names_with_prefix() should have a null check on its return value. The rest of this is nuance around HTML that has some issues to begin with. I don't feel strongly about the best path forward, but I do see inconsistencies in the way it's currently handled.

#7 follow-up: @dmsnell
6 months ago

agreed @jonsurrell we should make the Tag Processor consistent. I don’t know what that implicates for these specific instances.

I propose we focus this ticket on updating the Interactivity API calling code, which is actually where the bug is in place, since the HTML API is weirdly upholding its contract, even if in an unexpected way.

#8 in reply to: ↑ 7 @jonsurrell
6 months ago

  • Milestone changed from Awaiting Review to 6.9

Replying to dmsnell:

I propose we focus this ticket on updating the Interactivity API calling code, which is actually where the bug is in place, since the HTML API is weirdly upholding its contract, even if in an unexpected way.

Agreed.

#9 @hugosolar
6 months ago

@jonsurrell @dmsnell thanks for all the context here!
I agree with addressing Interactivity API issue to avoid the fatal error
I'll get this updated in my PR ASAP, and also add unit tests

#10 @zieladam
6 months ago

The issue with </br> is identical to with an unexpected </p>. These tags cannot contain HTML attributes and so return null.

AFAIR closing tags can contain attributes. The HTML spec has this parsing path:

Tag open state -> End tag open state -> Tag Name State -> before attribute name state -> attribute name state.

I remember we've discussed that in context of the inline token / funky comments and decided not to expose them.

@jonsurrell commented on PR #9641:


5 months ago
#11

👋 @hugosolar Just checking in, are you able to finish this up or would you like some help? The next release is approaching and it would be nice to get this fix in place.

#12 @wildworks
4 months ago

  • Milestone changed from 6.9 to 7.0

The RC1 release is approaching, and since the feedback on the pull request has not yet been incorporated, I'm punting this ticket to 7.0.

If this issue significantly impacts the 6.9 release, please submit an updated pull request, and I would appreciate it if it could be resolved before RC1.

#13 @hugosolar
4 months ago

@wildworks yes! sorry for the delay
a lot going on here
I'l make sure to update my PR before EOW

@hugosolar commented on PR #9641:


4 months ago
#15

@sirreal I've pushed the fix that seems to work better regarding the tests I've made here
Also, added one test to verify if </br> is breaking _process_directive function
I tested it locally and it's passing but not sure why it's failing in the pipelines
would you help me to figure what's the issue here?

The error is throwing this

Parse error: syntax error, unexpected ')' in /var/www/tests/phpunit/tests/interactivity-api/wpInteractivityAPI.php on line 1272

checked that line but can't see anything wrong with it

@hugosolar commented on PR #9641:


4 months ago
#16

Thanks for the help @sirreal !
I think this is ready for a CR

This ticket was mentioned in Slack in #core by welcher. View the logs.


4 months ago

#18 @welcher
4 months ago

  • Milestone changed from 7.0 to 6.9

@jonsurrell @luisherranz if this can be committed before RC 1 tomorrow, we can get it into WordPress 6.9. I'll move it back into the milestone and follow up then.

#19 @welcher
4 months ago

  • Keywords commit added

#20 @luisherranz
4 months ago

  • Owner set to luisherranz
  • Resolution set to fixed
  • Status changed from new to closed

In 61197:

Interactivity API: Fatal error processing incorrect closed tags.

Fix for fatal errors caused by incorrect closing tags in HTML, such as </br>. In these cases, the get_attribute_names_with_prefix method of the WP_HTML_Tag_Processor returns null, and the Interactivity API was not handling this situation correctly.

Props hugosolar, jonsurrell, darerodz.
Fixes #63891.

@dmsnell commented on PR #9657:


4 months ago
#21

@desrosj looks like this is another victim of over-eager bots closing work that wasn’t merged. this patch was related to the Core ticket mentioned in the SVN commit, but the actual changeset was orthogonal to it.

Note: See TracTickets for help on using tickets.