#58637 closed defect (bug) (fixed)
HTML API: Fatal error processing document with unclosed attribute
Reported by: | dlh | Owned by: | azaozz |
---|---|---|---|
Milestone: | 6.3 | Priority: | normal |
Severity: | normal | Version: | 6.2 |
Component: | HTML API | Keywords: | has-unit-tests has-patch |
Focuses: | Cc: |
Description (last modified by )
The HTML tag processor triggers a fatal error (in PHP 8+) when attempting to process a HTML string that is malformed because it ends in an unclosed attribute.
To replicate:
$html = '<iframe width="640" height="400" src="https://www.example.com/embed/abcdef'; $proc = new \WP_HTML_Tag_Processor( $html ); $proc->next_tag( 'iframe' );
Leads to:
ValueError: strpos(): Argument #3 ($offset) must be contained in argument #1 ($haystack)
in WP_HTML_Tag_Processor::next_tag()
: https://github.com/WordPress/wordpress-develop/blob/12ed5ccb0a61cf684682a94e9b9c02dd11bb7d75/src/wp-includes/html-api/class-wp-html-tag-processor.php#L549
I've added a test case in the linked Pull Request. I think I can see that the error occurs because WP_HTML_Tag_Processor::parse_next_attribute()
sets $bytes_already_parsed
to one byte after the end of the document, representing the missing closing quote of the attribute. But I'm less sure about where in the processor a fix for the problem might go, so I've left that open for comment for now.
I encountered a string like this as part of a content migration over other rows of well-formed HTML. In this scenario, I wouldn't expect the tag processor to be able to tell me anything about the string, but it would be helpful to migration scripts like mine for the processor to handle the bad string gracefully.
Attachments (1)
Change History (14)
This ticket was mentioned in PR #4711 on WordPress/wordpress-develop by dlh01.
15 months ago
#1
- Keywords has-patch has-unit-tests added
#5
@
15 months ago
Thanks for the ping @costdev - I'll examine this. I thought we caught this early on before 6.2, and we solved it not by adjusting the index (something done to help avoid infinite loops), but by checking if the indices are valid numbers before using them.
#6
@
15 months ago
I've uploaded a new patch with the fix that checks the index before use and adds a number of other tests to the suite. Without the patch to the tag processor the <iframe src="…
fails, but with the patch it succeeds.
Not sure how we missed this because it's the exact same kind of situation that we found on WordPress.com before the merge (and we fixed it), but I reviewed the other existing uses of strpos()
and it looks like they are all guarded.
@dlh I've uploaded this patch because your PR is on your own git fork, which is fine, but I can't push to it. If you want you can adopt it in the PR.
#7
@
15 months ago
- Keywords has-patch added; needs-patch dev-feedback removed
Thanks, @costdev and @dmsnell! I've updated my PR with 58637.patch.
#9
follow-up:
↓ 10
@
15 months ago
@dlh not only that, but I think it's fair to ask someone about adding this into 6.3 as a bug fix, because it is a real bug and this is a pointed bug fix.
@bernhard-reiter @azaozz thoughts here? if you're catching up, I'm sorry and not sure how we missed this one since it's essentially a replay of what we saw on WordPress.com before it was merged into Core, but I think this was the only place where such an oversight was made, or at least, on inspection this is the only place where we weren't checking for index-out-of-bounds before calling strpos()
(that is, after that earlier fix months ago).
#10
in reply to:
↑ 9
@
15 months ago
- Milestone changed from Awaiting Review to 6.3
Replying to dmsnell:
Sure. It's a bugfix with a patch that fixes a (very?) rare fatal error. The patch looks good here, will test a bit more and commit.
#11
@
15 months ago
- Owner set to azaozz
- Resolution set to fixed
- Status changed from new to closed
In 56133:
15 months ago
#12
Committed in https://core.trac.wordpress.org/changeset/56133.
A quick test shows that the new test and the existing tests pass if you change this line
from:
$attribute_end = $value_start + $value_length + 1;
to:
$attribute_end = min( strlen( $this->html ), $value_start + $value_length + 1 );
Pinging @dmsnell for insight into the issue and whether the above (or a variation of it) is an appropriate fix.