#63738 closed enhancement (fixed)
HTML API: Internal updates in 6.9
| Reported by: |
|
Owned by: |
|
|---|---|---|---|
| Milestone: | 6.9 | Priority: | normal |
| Severity: | normal | Version: | 6.9 |
| Component: | HTML API | Keywords: | has-patch has-unit-tests |
| Focuses: | Cc: |
Description
Tracking ticket for updates to the HTML API itself in 6.9.
- Interface enhancements
- Performance improvements
- Spec-compatability improvements
- Other internal details.
Change History (36)
This ticket was mentioned in PR #9231 on WordPress/wordpress-develop by @jonsurrell.
6 months ago
#1
- Keywords has-patch has-unit-tests added
This ticket was mentioned in PR #9301 on WordPress/wordpress-develop by @jonsurrell.
6 months ago
#2
Trac ticket: Core-63738
This ticket was mentioned in PR #9230 on WordPress/wordpress-develop by @jonsurrell.
6 months ago
#3
Trac ticket: Core-63738
Reduce the number of length checks in WP_HTML_Tag_Processor::skip_script_data().
@jonsurrell commented on PR #9301:
6 months ago
#6
Merged in [60501].
@jonsurrell commented on PR #9231:
6 months ago
#8
Merged in [60502].
This ticket was mentioned in PR #9318 on WordPress/wordpress-develop by @jonsurrell.
6 months ago
#9
Trac ticket: Core-63738
Follow-up to https://core.trac.wordpress.org/changeset/60502 / #9231
Flagged by @mukeshpanchal27 https://github.com/WordPress/wordpress-develop/pull/9231#discussion_r2227595923
@mukesh27 commented on PR #9318:
6 months ago
#10
Thanks for quick PR 🚀
@jonsurrell commented on PR #9318:
6 months ago
#12
Merged in [60503].
@jonsurrell commented on PR #9230:
6 months ago
#13
I've updated the comment and added a number of tests for different script data states. @dmsnell I'd appreciate if you could take another quick look.
This ticket was mentioned in PR #9397 on WordPress/wordpress-develop by @jonsurrell.
6 months ago
#14
This includes https://github.com/WordPress/wordpress-develop/pull/9230 (merged here).
Trac ticket: https://core.trac.wordpress.org/ticket/63738
Address two small HTML API mis-parses of script contents.
---
<script> `. This abruptly closed empty comment does not transition to _escaped_, but remains in the _unescaped_ state. In the example, the _double-escaped_ state should not be reached on `<script>` (because the processor should not be in the _escaped_ state) and the script tag should close correctly at `</script>`. [https://playground.wordpress.net/?plugin=html-api-debugger&url=%2Fwp-admin%2Fadmin.php%3Fpage%3Dhtml-api-debugger%26html%3D%253Cscript%253E%250A%253C%2521--%253E%250A%253Cscript%253E%250A%253C%252Fscript%253E%25F0%259F%258E%2589&wp=6.8.2 before] / [https://playground.wordpress.net/?plugin=html-api-debugger&url=%2Fwp-admin%2Fadmin.php%3Fpage%3Dhtml-api-debugger%26html%3D%253Cscript%253E%250A%253C%2521--%253E%250A%253Cscript%253E%250A%253C%252Fscript%253E%25F0%259F%258E%2589&core-pr=9397 after] --- ```html <script> <!-- <script</script>🎉
In this case <script< is correctly recognized as _not_ a sequence that should transition from _escaped_ to _double-escaped_, however it incorrectly advances beyond the following < character that starts the script close tag and does not close correctly at </script>.
@jonsurrell commented on PR #9230:
6 months ago
#15
I've pulled the updated comment and moved the test adjacent to the other script parsing test, things which were noted in https://github.com/WordPress/wordpress-develop/pull/9397.
@jonsurrell commented on PR #9230:
6 months ago
#17
Merged in [60617].
@jonsurrell commented on PR #9397:
6 months ago
#18
Good reminder on the coverage reports. I've added more tests and all the lines in skip_script_data are now hit with the exception of this condition, the body of this is never entered:
I think that it's impossible to hit now (after [60617] / https://github.com/WordPress/wordpress-develop/pull/9230). It may always have been impossible. However, I don't mind leaving that check even if it's redundant.
I believe the condition would be hit on inputs like <script></script that end right after the closing script tag name. With the early length check it shouldn't be possible to hit this condition because the function would already have returned false here:
This ticket was mentioned in PR #9402 on WordPress/wordpress-develop by @jonsurrell.
6 months ago
#19
Trac ticket: https://core.trac.wordpress.org/ticket/63738
Address a minor HTML API mis-parse of script contents, where \f is not recognized as a valid trailing/termination character of SCRIPT tag names.
---
In this case, the \f _form feed_ should be recognized as the end of the script tag name and close the script:
<script></script␌>🎉
---
This is another example of the same issue. Again, \f _form feed_ should be recognized as the end of the script tag name and enter the double-escaped state (this script tag is not closed and consumes the rest of the document):
<script><!--<script␌</script>🎉
In this case <script< is correctly recognized as _not_ a sequence that should transition from _escaped_ to _double-escaped_, however it incorrectly advances beyond the following < character that starts the script close tag and does not close correctly at </script>.
6 months ago
#20
Is it possible to have Copilot not pollute the PR with its comments? Can it not run locally, or at least leave comments as the result of some command you run? or is it only designed to run from the Github interface and leave erroneous comments?
@jonsurrell commented on PR #9397:
6 months ago
#21
Thanks for diving in and doing your own investigation, I'm happy to have my research confirmed ❤️
The blog post we're referencing is available here: Safe JSON in script tags: How not to break a site
On that note, the way I read your diagram, the
</scripttransitions intoClose SCRIPT tagare misleading, because the only exit out of the element is a complete closing tag whose name is SCRIPT. This means that if you encounter</script-in those places, we stay inscript dataor inscript data escaped.
That's a good point, it was a bit ambiguous. I alluded to the trailing character in a foot note but it wasn't sufficiently clear. I added a note after the diagram to help clarify because this is an important point.
6 months ago
#22
It’s fine to keep these separate, but would we want to combine this with #9402? Maybe merge it into here?
This is mostly about accounting and not about code.
@jonsurrell commented on PR #9402:
5 months ago
#23
@jonsurrell commented on PR #9397:
5 months ago
#24
I've merged https://github.com/WordPress/wordpress-develop/pull/9402 here.
@jonsurrell commented on PR #9397:
5 months ago
#26
Merged in [60649].
This ticket was mentioned in PR #9560 on WordPress/wordpress-develop by @jonsurrell.
5 months ago
#27
The WP_Tag_Processor::set_modifiable_text() method should treat <script the same way as it treats </script. Either of these sequences may affect the script elements close.
Trac ticket: https://core.trac.wordpress.org/ticket/63738
@jonsurrell commented on PR #9560:
5 months ago
#28
This could allow some more safe inputs with a regular expression based check. There may also be ways to escape the script tag contents.
Before exploring those improvements, I'd like to address this issue where some potentially dangerous script tag contents are allowed.
@jonsurrell commented on PR #9560:
5 months ago
#30
Merged in [60706].
This ticket was mentioned in Slack in #core by welcher. View the logs.
4 months ago
#33
@
3 months ago
@dmsnell The 6.9 Beta1 release is coming soon, can this ticket be closed now? Are there any remaining tasks that need to be addressed on this ticket?
#34
@
3 months ago
@wildworks I’m still working on these; I don’t know what the practice is, but if we could wait to close them until the Beta is out that would be a big help to me (keep me from having to address the same comment or closure multiple times).
Either way, if you need to close them I won’t stand in your way, but I will continue to try and merge work from them before the Beta.
#35
@
3 months ago
- Resolution set to fixed
- Status changed from assigned to closed
As the Beta1 release begins, I will close this ticket. If there are any other issues that need to be addressed, please leave a comment.
Trac ticket: Core-63738
Simplify META tag handling with regards to encoding confidence.
With any of the standard HTML Processor creation paths (
WP_HTML_Processor::create_fragment()andWP_HTML_Processor::create_full_parser()), the encoding is never set totentative. It is eitherirrelevant(on fragments) orcertainon full parsers. In both cases, only theUTF-8encoding is supported.This is an optimization to lift a necessary condition to the top level and avoid processing in the case we expect to encounter in almost every case.
---
Fragments follow this path to set
irrelevant:https://github.com/WordPress/wordpress-develop/blob/1c8185e12dd9fb1606eb22118f09aef46e6737d0/src/wp-includes/html-api/class-wp-html-processor.php#L319
https://github.com/WordPress/wordpress-develop/blob/1c8185e12dd9fb1606eb22118f09aef46e6737d0/src/wp-includes/html-api/class-wp-html-processor.php#L547
And full parsers set
certainhere:https://github.com/WordPress/wordpress-develop/blob/1c8185e12dd9fb1606eb22118f09aef46e6737d0/src/wp-includes/html-api/class-wp-html-processor.php#L345
Trac ticket: