Make WordPress Core

Opened 5 months ago

Closed 5 months ago

Last modified 5 months ago

#58637 closed defect (bug) (fixed)

HTML API: Fatal error processing document with unclosed attribute

Reported by: dlh's profile dlh Owned by: azaozz's profile azaozz
Milestone: 6.3 Priority: normal
Severity: normal Version: 6.2
Component: HTML API Keywords: has-unit-tests has-patch
Focuses: Cc:

Description (last modified by dlh)

The HTML tag processor triggers a fatal error (in PHP 8+) when attempting to process a HTML string that is malformed because it ends in an unclosed attribute.

To replicate:

$html = '<iframe width="640" height="400" src="https://www.example.com/embed/abcdef';
$proc = new \WP_HTML_Tag_Processor( $html );
$proc->next_tag( 'iframe' );

Leads to:

ValueError: strpos(): Argument #3 ($offset) must be contained in argument #1 ($haystack)

in WP_HTML_Tag_Processor::next_tag(): https://github.com/WordPress/wordpress-develop/blob/12ed5ccb0a61cf684682a94e9b9c02dd11bb7d75/src/wp-includes/html-api/class-wp-html-tag-processor.php#L549

I've added a test case in the linked Pull Request. I think I can see that the error occurs because WP_HTML_Tag_Processor::parse_next_attribute() sets $bytes_already_parsed to one byte after the end of the document, representing the missing closing quote of the attribute. But I'm less sure about where in the processor a fix for the problem might go, so I've left that open for comment for now.

I encountered a string like this as part of a content migration over other rows of well-formed HTML. In this scenario, I wouldn't expect the tag processor to be able to tell me anything about the string, but it would be helpful to migration scripts like mine for the processor to handle the bad string gracefully.

Attachments (1)

58637.patch (2.8 KB) - added by dmsnell 5 months ago.

Download all attachments as: .zip

Change History (14)

This ticket was mentioned in PR #4711 on WordPress/wordpress-develop by dlh01.


5 months ago
#1

  • Keywords has-patch has-unit-tests added

#2 @dlh
5 months ago

  • Keywords needs-patch added; has-patch removed

#3 @dlh
5 months ago

  • Description modified (diff)

#4 @costdev
5 months ago

  • Keywords dev-feedback added

A quick test shows that the new test and the existing tests pass if you change this line

from:

$attribute_end              = $value_start + $value_length + 1;

to:

$attribute_end              = min( strlen( $this->html ), $value_start + $value_length + 1 );

Pinging @dmsnell for insight into the issue and whether the above (or a variation of it) is an appropriate fix.

#5 @dmsnell
5 months ago

Thanks for the ping @costdev - I'll examine this. I thought we caught this early on before 6.2, and we solved it not by adjusting the index (something done to help avoid infinite loops), but by checking if the indices are valid numbers before using them.

@dmsnell
5 months ago

#6 @dmsnell
5 months ago

I've uploaded a new patch with the fix that checks the index before use and adds a number of other tests to the suite. Without the patch to the tag processor the <iframe src="… fails, but with the patch it succeeds.

Not sure how we missed this because it's the exact same kind of situation that we found on WordPress.com before the merge (and we fixed it), but I reviewed the other existing uses of strpos() and it looks like they are all guarded.

@dlh I've uploaded this patch because your PR is on your own git fork, which is fine, but I can't push to it. If you want you can adopt it in the PR.

#7 @dlh
5 months ago

  • Keywords has-patch added; needs-patch dev-feedback removed

Thanks, @costdev and @dmsnell! I've updated my PR with 58637.patch.

#8 @dlh
5 months ago

@dmsnell Would it be OK with you if I added this ticket to the 6.4 milestone?

#9 follow-up: @dmsnell
5 months ago

@dlh not only that, but I think it's fair to ask someone about adding this into 6.3 as a bug fix, because it is a real bug and this is a pointed bug fix.

@bernhard-reiter @azaozz thoughts here? if you're catching up, I'm sorry and not sure how we missed this one since it's essentially a replay of what we saw on WordPress.com before it was merged into Core, but I think this was the only place where such an oversight was made, or at least, on inspection this is the only place where we weren't checking for index-out-of-bounds before calling strpos() (that is, after that earlier fix months ago).

#10 in reply to: ↑ 9 @azaozz
5 months ago

  • Milestone changed from Awaiting Review to 6.3

Replying to dmsnell:

Sure. It's a bugfix with a patch that fixes a (very?) rare fatal error. The patch looks good here, will test a bit more and commit.

#11 @azaozz
5 months ago

  • Owner set to azaozz
  • Resolution set to fixed
  • Status changed from new to closed

In 56133:

HTML API: Fix a fatal error when processing malformed document with unclosed attribute.

Props: dlh, costdev, dmsnell.
Fixes: #58637.

#13 @dmsnell
5 months ago

Thank you all!

Note: See TracTickets for help on using tickets.