Make WordPress Core

Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#54261 closed enhancement (fixed)

KSES: Allow PDFs to be embeded as objects

Reported by: pento's profile pento Owned by: pento's profile pento
Milestone: 5.9 Priority: normal
Severity: normal Version:
Component: Formatting Keywords: 2nd-opinion needs-testing has-patch has-dev-note
Focuses: Cc:

Description

โ€‹Gutenberg 10.5 added the ability to display PDFs as embeds, but made use of the <object> tag, which KSES doesn't allow.

It's certainly not feasible to allow <object> tags without limitations: while most of the original problematic uses of it are no longer supported in browsers (Java applets, ActiveX, Flash, etc), it would be challenging to prove that there are no potential security issues with allowing it for all object types.

Instead, this change allows the <object> tag only when it has a type attribute set to application/pdf.

Change History (33)

This ticket was mentioned in โ€‹PR #1757 on โ€‹WordPress/wordpress-develop by โ€‹pento.


3 years ago
#1

  • Keywords has-unit-tests added

#2 @pento
3 years ago

  • Keywords has-unit-tests removed
  • Summary changed from KSES: Allow PDFs to be embed as objects to KSES: Allow PDFs to be embeded as objects

#3 @pento
3 years ago

  • Keywords has-unit-tests added

โ€‹pento commented on โ€‹PR #1757:


3 years ago
#4

Okay, I think this is pretty close. It could to with a solid review, including efforts to break it, given how important KSES is.

One particular behaviour I want to point out is the difference between self-closing tags, and non-self-closing tags. As KSES doesn't keep track of open/close tag pairs, it's not possible to reliably remove a tag and its innerHTML. So, rather than removing the opening tag (and leaving the closing tag), I think the safest option is to just strip all attributes from the opening tag.

eg:

<object type="application/exe" /> is removed entirely.
<object type="application/exe"></object> becomes <object></object>.

I'm not aware of any tags that could cause safety issues if _all_ attributes are removed (for example, <iframe> could become unsafe if just the sandbox attribute is removed, since src would be removed as well, it would be safe), but I'd appreciate some additional thoughts on this.

โ€‹dd32 commented on โ€‹PR #1757:


3 years ago
#5

While I haven't executed the code, the direction and implementation seems correct to me after reading through the code flow of the KSES branch affected.

The only edge-cases I could think of are covered in the tests (I read the tests last).

I guess someone could be using $allowedposttags in another way and the new required-attribute branch could break that, but I don't see how or why someone would do that.. at most they'd only care about the tag keys.. surely..

There's also the potential that a plugin that adds the <object> tag to it could break or break this, but I think the plugin is likely broken already if that's the case.

#6 follow-up: @pento
3 years ago

  • Keywords needs-dev-note added

Thanks for the testing and feedback!

@peterwilsoncc: I've implemented the changes you recommended.

@dd32: I agree with your points on potential breakage, there's a small risk, but I think it's highly unlikely.

If someone is substantially changing the structure of $allowedposttags (and then restoring it in wp_kses_allowed_html, perhaps), that would be a totally unsupported use case. Same for if they're defining the required and values properties to mean something else.

If a plugin is adding the <object> tag to $allowedposttags, the only supported use case would be to work with KSES' existing behaviour, so the plugin should continue to function as is.

Just in case, however, I've flagged this ticket as needing a dev note, to ensure the changes are more widely disseminated.

#7 in reply to: โ†‘ย 6 @dd32
3 years ago

Replying to pento:

I agree with your points on potential breakage, there's a small risk, but I think it's highly unlikely.

If someone is substantially changing the structure of $allowedposttags (and then restoring it in wp_kses_allowed_html, perhaps), that would be a totally unsupported use case.

Agreed, I mostly mentioned it out of completeness of looking for breakage points.

I've flagged this ticket as needing a dev note, to ensure the changes are more widely disseminated.

The most relevant thing to flag in a dev-note would be that "KSES can now have required attributes for tags, and whitelisted values for those attributes", and that a use-case is PDF <object>'s.

โ€‹pento commented on โ€‹PR #1757:


3 years ago
#8

In addition to the one case I suggest, I wonder if it would also be good to add a test for if a filter is already adding object to wp_kses_allowed_html without requiring this specific type.

cf63ee7 adds a test for what I think would be the likely usage of this filter, but it can be tweaked further if you have particular use cases in mind. ๐Ÿ™‚

โ€‹peterwilsoncc commented on โ€‹PR #1757:


3 years ago
#9

โ€‹cf63ee7 adds a test for what I think would be the likely usage of this filter,

Does it need to account for modifications to $allowedposttags directly or do you think that will be dandy?

I don't know the answer to this, so it's a genuine question rather than a passive aggressive change request.

โ€‹pento commented on โ€‹PR #1757:


3 years ago
#10

Does it need to account for modifications to $allowedposttags directly or do you think that will be dandy?

I don't think it needs to test for that as well: KSES runs $allowedposttags through the wp_kses_allowed_html filter before using it, so whilst the method could be different, the outcome is the same.

โ€‹pento commented on โ€‹PR #1757:


3 years ago
#11

Does it need to account for modifications to $allowedposttags directly or do you think that will be dandy?

I don't think it needs to test for that as well: KSES runs $allowedposttags through the wp_kses_allowed_html filter before using it, so whilst the method could be different, the outcome is the same.

#12 @pento
3 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 51963:

KSES: Add options for restricting tags based upon their attributes.

This change adds two now attribute-related config options to KSES:

  • An array of allowed values can be defined for attributes. If the attribute value doesn't fall into the list, the attribute will be removed from the tag.
  • Attributes can be marked as required. If a required attribute is not present, KSES will remove all attributes from the tag. As KSES doesn't match opening and closing tags, it's not possible to safely remove the tag itself, the safest fallback is to strip all attributes from the tag, instead.

Included with this change is an implementation of these options, allowing the <object> tag to be stored in posts, but only when it has a type attribute set to application/pdf.

Props pento, swissspidy, peterwilsoncc, dd32, jorbin.
Fixes #54261.

#13 @SergeyBiryukov
3 years ago

In 52234:

Docs: Add a @since note and description to wp_kses_attr() for new attribute-related KSES options:

  • Support for an array of allowed values for attributes.
  • Support for required attributes.

Follow-up to [51963].

See #54261.

#14 @peterwilsoncc
3 years ago

In 52304:

KSES: Allow attributes to be restricted via callbacks.

Add callback validation to HTML tag attributes for increased flexibility over an array of values only.

In object tags, validate the data attribute via a callback to ensure it is a PDF and matches the type attribute. This prevents mime type mismatches in browsers.

Follow up to [51963].

Props Pento, dd32, swissspidy, xknown, peterwilsoncc.
Fixes #54261.

#15 follow-up: @ocean90
3 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

[52304] assumes that the upload URL has no port so the check will fail for something like localhost:8888 due to the trailing slash in "http://$upload_host/". This can be seen in one of โ€‹Gutenberg's check runs as reported by @youknowriad.

It's also odd that there seems to remain an empty object element.

This ticket was mentioned in โ€‹Slack in #core by ocean90. โ€‹View the logs.


3 years ago

This ticket was mentioned in โ€‹Slack in #core by audrasjb. โ€‹View the logs.


3 years ago

This ticket was mentioned in โ€‹PR #1998 on โ€‹WordPress/wordpress-develop by โ€‹ramonjd.


3 years ago
#18

[WIP]

Related to https://core.trac.wordpress.org/ticket/54261#comment:15

Testing a way to cater for port numbers in pdf $url argument.
Renaming $value to $url to match PHP doc comment.

Trac ticket:

โ€‹ramonjd commented on โ€‹PR #1998:


3 years ago
#19

Tests are passing now ๐Ÿ˜„

Are there any further tests required for this change (assuming the change itself is sound)?

#20 in reply to: โ†‘ย 15 ; follow-up: @peterwilsoncc
3 years ago

โ€‹PR 1998 covers the port oversight causing the tests to fail on the Gutenberg repo, it also adds some tests to for sites running on a port.

Props @ramonopoly


Replying to ocean90:

It's also odd that there seems to remain an empty object element.

KSES can't match opening to closing tags (and to do so would be a major refactor) so following a discussion on the PR, it was decided the empty object tag was the best solution available.

As this only becomes an issue if a user with unfiltered_html tries to mess with the input, I'm happy enough with that.

#21 @ramonopoly
3 years ago

Thanks for your help on this one @peterwilsoncc

#22 in reply to: โ†‘ย 20 @peterwilsoncc
3 years ago

Replying to peterwilsoncc:

As this only becomes an issue if a user with unfiltered_html tries to mess with the input, I'm happy enough with that.

without unfiltered_html, rather.

โ€‹peterwilsoncc commented on โ€‹PR #1998:


3 years ago
#23

It'd be good to put some unhappy paths in to make sure they fail with the ports too, for example example.org should be removed if the upload url is example.org:8888

I don't think the new test will need as many scenarios as the other data provider as it covers most of the unrelated to port unhappy paths.

According to the WordPress Slack, your on vacation at the moment so I can look in to that and push to your repo if you do not mind.

โ€‹ramonjd commented on โ€‹PR #1998:


3 years ago
#24

It'd be good to put some unhappy paths in to make sure they fail with the ports too, for example example.org should be removed if the upload url is example.org:8888

Great call. I've added a couple more, but feel free to edit/delete/change as you like.

According to the WordPress Slack, your on vacation at the moment so I can look in to that and push to your repo if you do not mind.

Hah, I'm mostly around. Thanks for offering!! I just like trees ๐ŸŒด

#25 @peterwilsoncc
3 years ago

In 52309:

KSES: Accept port number in PDF upload paths.

Improves the URL validation in _wp_kses_allow_pdf_objects() to account for sites using an upload path that contains a port, for example wp.org:8080.

Follow up to [51963], [52304].

Props ocean90, ramonopoly, talldanwp.
See #54261.

#26 @peterwilsoncc
3 years ago

  • Keywords has-patch has-unit-tests removed

Fix to account for upload paths with port numbers committed above.

I've left the ticket open to address a question @swissspidy asked about supporting fragments. Chrome allows links to certain pages using fragments, eg document.pdf#page=10, so it's worth testing whether this works for embeds too.

If it's a standard or a faux-standard a majority of browsers support then I'm happy to account for it if it's possible to do so safely. If it's a Chrome only feature that's not part of a standard then I think the ticket can be closed.

#28 @TobiasBg
3 years ago

Suggestion: As it's a new function in WP 5.9, _wp_kses_allow_pdf_objects() could make use of the PHP 8 (but polyfilled via [52039] and [52040] in WP 5.9) str_contains() and str_ends_with() functions.

This ticket was mentioned in โ€‹PR #2010 on โ€‹WordPress/wordpress-develop by โ€‹ramonjd.


3 years ago
#29

  • Keywords has-patch added

Implementing @TobiasBg's suggestion:

As it's a new function in WP 5.9, _wp_kses_allow_pdf_objects() could make use of the PHP 8 (but polyfilled via [52039] and [52040] in WP 5.9) str_contains() and str_ends_with() functions.

Trac ticket (comment): https://core.trac.wordpress.org/ticket/54261#comment:28

The tests in โ€‹tests/kses.php should pass.

#30 @SergeyBiryukov
3 years ago

In 52326:

KSES: Use the polyfilled PHP 8 string functions in _wp_kses_allow_pdf_objects():

  • str_contains()
  • str_ends_with()
  • str_starts_with()

Additionally, include a test for a PDF file in an <object> tag with an unsupported protocol.

Follow-up to [51963], [52039], [52040], [52304], [52309].

Props TobiasBg, ramonopoly.
See #54261.

#32 @peterwilsoncc
3 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

I'm closing this fixed on the commits above.

The UI for the PDF embed block doesn't support altering the page displayed so allowing for that via fragments can be done in a follow-up ticket if the feature is added later.

#33 @audrasjb
3 years ago

  • Keywords has-dev-note added; needs-dev-note removed
Note: See TracTickets for help on using tickets.