WordPress.org

Make WordPress Core

Opened 5 years ago

Last modified 9 months ago

#29807 new enhancement

add support for picture element and srcset attribute on img in wp_kses

Reported by: mattheu Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version:
Component: Formatting Keywords: has-patch has-unit-tests needs-refresh
Focuses: Cc:
PR Number:

Description

Related to #28993 - (<img> srcset attribute is stripped when switching from text to visual tab in the editor)

If support for srcset attribute and picture element (and source element) is to be added to tinyMCE, they should also be allowed in wp_kses.

Attachments (10)

29807.2.diff (3.0 KB) - added by mattheu 5 years ago.
29807.3.diff (2.9 KB) - added by mattheu 5 years ago.
Remove accidental § character
29807.4.diff (2.6 KB) - added by mattheu 5 years ago.
wp_kses_hair.patch (5.7 KB) - added by drrobotnik 3 years ago.
wp_kses_hair add srcset and sizes
kses.patch (563 bytes) - added by drrobotnik 3 years ago.
add srcset and sizes to uri array
29807.diff (3.8 KB) - added by desrosj 2 years ago.
29807.5.diff (3.8 KB) - added by desrosj 2 years ago.
29807.6.diff (4.8 KB) - added by peterwilsoncc 2 years ago.
29807.7.diff (6.7 KB) - added by 1000camels 14 months ago.
Adds support for processing multiple uris in srcset attribute
29807.8.diff (9.0 KB) - added by 1000camels 14 months ago.
Creates specific function to handle this case of potential uri candidates in certain attributes

Download all attachments as: .zip

Change History (41)

#1 @joehoyle
5 years ago

  • Keywords has-patch needs-unit-tests added

@mattheu
5 years ago

@mattheu
5 years ago

Remove accidental § character

#2 @mattheu
5 years ago

Added unit tests

#3 @joehoyle
5 years ago

  • Keywords dev-feedback added; needs-unit-tests removed

#4 @DrewAPicture
5 years ago

  • Component changed from General to Formatting

#5 @helen
5 years ago

  • Milestone changed from Awaiting Review to Future Release

#6 @rmccue
5 years ago

This likely needs URL validation applied to it as well in wp_kses_hair. This might be complicated, seeing as it's not just a straight up URI attribute.

#7 @johnbillion
5 years ago

  • Version trunk deleted

#8 @miqrogroove
5 years ago

  • Keywords kses added

@mattheu
5 years ago

#9 @mattheu
5 years ago

I've refreshed this patch to merge cleanly.

As there is now a feature plugin project for high res images, I think this would be needed.

@rmccue I've looked into wp_kses_hair. You're right that its not so easy to try and validate the URLs in the srcset attribute. I'm not sure what to do here. The only thing I can think is that we could take srcset as a special case and then - knowing the format, we can get parse the string to grab the URLs and check for bad protocols. This is not easy to do though.

Last edited 5 years ago by mattheu (previous) (diff)

#10 @miqrogroove
4 years ago

I think there are security issues with srcset, not just because it require more parsing but also because it allows the author to specify content that would be displayed selectively, potentially hiding its true content from an editor.

Version 0, edited 4 years ago by miqrogroove (next)

#11 @miqrogroove
4 years ago

Suggest making this a duplicate of #17105. Any additional thoughts?

#12 @littler.chicken
4 years ago

I was going to open a new ticket on this but was pointed here--since this is still open, although old, I'm leaving a comment here instead of doing a new ticket. Using trunk and trying to output an image using wp_kses_post( $image ) ... if I do this, the srcset attributes are removed from the image. I'd like to have them, but not at the cost of sanitizing the output.

#13 @ocean90
3 years ago

#39455 was marked as a duplicate.

#14 follow-up: @wilto
3 years ago

Yo—I’m one of the RICG folks that worked on the responsive images spec.

Just ran into this issue myself. Anything I can do to help expedite things?

#15 in reply to: ↑ 14 ; follow-up: @ocean90
3 years ago

  • Keywords needs-patch needs-unit-tests added; has-patch dev-feedback kses removed

Replying to wilto:

comment:6 is still valid so a new patch that addresses this issue would be helpful.

Last edited 3 years ago by ocean90 (previous) (diff)

#16 in reply to: ↑ 15 @drrobotnik
3 years ago

Replying to ocean90:

Replying to wilto:

comment:6 is still valid so a new patch that addresses this issue would be helpful.

wp_kses_hair is a lot to digest... attached is a patch that adds srcset and sizes to the $uris array. Needs testing. Also includes formatting and a slight refactor to case 2 in the switch statement. All of the scenarios in case 2 return the same way, the only thing that changes is the regex.

@drrobotnik
3 years ago

wp_kses_hair add srcset and sizes

#17 follow-up: @ocean90
3 years ago

Thanks @drrobotnik! Could you please submit a patch that doesn't include code styling changes? They just make it harder to review the patch and is usually discouraged, see https://make.wordpress.org/core/handbook/contribute/code-refactoring/.

Tests can be added to trunk/tests/phpunit/tests/kses.php.

@drrobotnik
3 years ago

add srcset and sizes to uri array

#18 in reply to: ↑ 17 @drrobotnik
3 years ago

Replying to ocean90:

Thanks @drrobotnik! Could you please submit a patch that doesn't include code styling changes? They just make it harder to review the patch and is usually discouraged, see https://make.wordpress.org/core/handbook/contribute/code-refactoring/.

Tests can be added to trunk/tests/phpunit/tests/kses.php.

Oh, I think i misread one of the (wontfix) tickets about reformatting. I thought it said reformatting is only accepted with a refactor. No worries.

Outside of the formatting/refactor, I've essentially only added the new attributes to the uri array (no additional validation). I was hoping this would be a kickoff to getting the issue resolved. Additionally, hoping someone else could contribute tests.

@desrosj
2 years ago

@desrosj
2 years ago

#19 @desrosj
2 years ago

https://core.trac.wordpress.org/attachment/ticket/29807/29807.5.diff refreshes and combines the previous patches. @mattheu your original test function had ismap as an allowed property on <img /> tags. I couldn't that in the history of kses. After removing that your tests pass properly.

I also added a test to ensure a bad protocol is properly stripped out of the srcset attribute.

#20 @desrosj
2 years ago

  • Keywords has-patch has-unit-tests added; needs-patch needs-unit-tests removed

#21 follow-ups: @peterwilsoncc
2 years ago

  • Keywords needs-refresh added

The combined patch misses the changes required for processing URLs in srcset correctly. In 29807.6.diff I've modified added some tests for comma separated srcset values and the tests begin failing.

The refactor of wp_kses_hair() remains a gift for someone suitably enthusiastic.

#22 in reply to: ↑ 21 ; follow-up: @drrobotnik
2 years ago

Replying to peterwilsoncc:

The refactor of wp_kses_hair() remains a gift for someone suitably enthusiastic.

Ref patch:
https://core.trac.wordpress.org/ticket/29807?replyto=21#comment:16

What's the right approach to refactor? My patch was a small refactor on wp_kses_hair. I started with formatting but stopped because I didn't want to go down the rabbit hole and not have the patch accepted.

#23 in reply to: ↑ 22 @peterwilsoncc
2 years ago

Replying to drrobotnik:

What's the right approach to refactor? My patch was a small refactor on wp_kses_hair. I started with formatting but stopped because I didn't want to go down the rabbit hole and not have the patch accepted.

Apologies for missing your question earlier, the full details can be found in the srcset spec but it's a little dry.

A srcset is made up of one or more image candidates separated by commas, each image candidate needs to be validated separately srcset='candidate 1, candidate 2, candidate 3'.

To validate each candidate:

  • split into sections on whitespace
  • discard image candidate if there are three or more sections
  • the first section should validate as a URL
  • the second section is optional and should validate as a number followed by a letter.
  • if either section is invalid, the entire image candidate should be discarded

Unfortunately it's a rather complicated attribute. The validation of the URL matches the same validation used for the href, src and other URL attributes.

This ticket was mentioned in Slack in #core-media by joemcgill. View the logs.


22 months ago

#25 @peterwilsoncc
14 months ago

#45011 was marked as a duplicate.

#26 in reply to: ↑ 21 ; follow-up: @1000camels
14 months ago

Replying to peterwilsoncc:

The refactor of wp_kses_hair() remains a gift for someone suitably enthusiastic.

I am enthusiastic, but might not be skilled enough.

I wanted to discuss some approaches to dealing with the weird case of srcset.

It strikes me that the first major issue is there are multiple values for srcset, separated by a comma. This is similar to the style attribute, which is delimited by ; and which has its own function to test it (safecss_filter_attr). Are there any other html attributes that function like this? If there are, it might make sense to develop an approach to exploding multi-valued attributes, possibly putting them into their own version of $attrarr element, and collapse these again, once they pass.

Of course, that doesn't deal with the additional 'width descriptor', so perhaps the real way to approach this is to create a condition for srcset in wp_kses_hair().

I've made a stab at this and I have allowed for the wp_kses_bad_protocol to process any number of urls in srcset (or rather any attribute - is this a problem?). It does nothing to process the width descriptor, which is optional. It also does not check that there is a sizes attribute, which is required. However, my sense is that this code is not validating the HTML (which is fine according to Postel). It is rather just making sure we don't allow bad markup, which in the case of picture, img and source, it is primarily checking for valid structure and good protocols. So I think this works.

Of course, it could also be written better, cleaner and more succinct, but I will leave that to those who know better.

@1000camels
14 months ago

Adds support for processing multiple uris in srcset attribute

#27 in reply to: ↑ 26 ; follow-up: @peterwilsoncc
14 months ago

Replying to 1000camels:

I am enthusiastic, but might not be skilled enough.

Thanks for your enthusiasm and I am sure you have the skill set required.

I like the idea of adapting the current URL validation but I think WP will need to use a special case function similar to that for the style attribute.

While commas should technically be encoded as %2C in URL attributes, most browsers parse them successfully when unencoded, as seen in this example on jsbin. There are a number of popular services that advise using commas unencoded, such as Google Fonts, that WP needs to allow for too.

Code wise, I like your approach of splitting up the URL candidates (as a special case for srcset) but you will need to split each item in $thesevals on whitespace before passing the data through to protocol validation.

#28 in reply to: ↑ 27 @1000camels
14 months ago

Replying to peterwilsoncc:

While commas should technically be encoded as %2C in URL attributes, most browsers parse them successfully when unencoded, as seen in this example on jsbin. There are a number of popular services that advise using commas unencoded, such as Google Fonts, that WP needs to allow for too.

Code wise, I like your approach of splitting up the URL candidates (as a special case for srcset) but you will need to split each item in $thesevals on whitespace before passing the data through to protocol validation.

Oh boy. Now I am not sure how to deal with commas and spaces in URLs. The width descriptor is optional, there may or may not be a space within each image candidate.

#29 @peterwilsoncc
14 months ago

I think it's fine to only worry about unencoded commas in the single URL attributes.

For srcset you can safely split on the comma and assume any of the candidates' URL components will have commas and spaces encoded.

<?php
// At the top of the function
$uris           = array( 'xmlns', 'profile' /* ... */ );
$uri_candidates = array( 'srcset' );

// Later in the function
if ( in_array( strtolower( $attrname ), $uri_candidates ) ) {
   // Split into values as in your original patch
   $thesevals = preg_split( '/\s*,\s*/', $thisval );

} elseif ( in_array( strtolower( $attrname ), $uris ) ) {
   // Maintain current validation
   $thesevals = array( $thisval );
}

// Validate $thesevals.

#30 @1000camels
14 months ago

Ok, I can work on this. But I am wondering about applying this logic to another function, so that we aren't looping on this condition three times. Plus, I think this should also be applied to wp_kses_one_attr(), which wasn't accounting for the addition of srcset and sizes. Hopefully this is a better approach.

I've also added a test for a url that has commas but no space. In order to pass this test, I have adjusted the regex to /\s*,\s+/.
This is assuming that the following space is always there, which does not quite match the the srcset spec. Is there another character (ie. :) that would definitely identify a url portion of the string?

@1000camels
14 months ago

Creates specific function to handle this case of potential uri candidates in certain attributes

#31 @desrosj
9 months ago

#46419 was marked as a duplicate.

Note: See TracTickets for help on using tickets.