WordPress.org

Make WordPress Core

Opened 11 months ago

Last modified 11 months ago

#41304 new defect (bug)

Bad protocol sanitization in KSES for URLs NOT RFC 3986 compliant

Reported by: bogdanpreda Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 4.8
Component: Formatting Keywords:
Focuses: Cc:

Description

For URL's that are passed through the kses sanitizer. As specified in RFC 3986, Section 3.3

The path component contains data, usually organized in hierarchical form, that, along with data in the non-hierarchical query component (Section 3.4), serves to identify a resource within the scope of the URI's scheme and naming authority (if any). The path is terminated by the first question mark ("?") or number sign ("#") character, or by the end of the URI.

If a URI contains an authority component, then the path component must either be empty or begin with a slash ("/") character. If a URI does not contain an authority component, then the path cannot begin with two slash characters (""). In addition, a URI reference (Section 4.1) may be a relative-path reference, in which case the first path segment cannot contain a colon (":") character. The ABNF requires five separate rules to disambiguate these cases, only one of which will match the path substring within a given URI reference. We use the generic term "path component" to describe the URI substring matched by the parser to one of these rules.

So colon(':') is allowed inside URL's. When trying to split the URL like this:

<?php
function wp_kses_bad_protocol_once($string, $allowed_protocols, $count = 1 ) {
        $string2 = preg_split( '/:|&#0*58;|&#x0*3a;/i', $string, 2 );
...

for URL's that do not contain a specified scheme and use colon (':') inside the URL this breaks and returns only the second part of the URL after the colon. Eg:

t0.gstatic.com/images?q=tbn:ANd9GcSxT2q6fV-59s5hq5a03fpgsFYzVtL014iARzGRG7S_3CUjYpIGNlQx0ruGtVl5KCAEOxAtb_ZQ

will return: ANd9GcSxT2q6fV-59s5hq5a03fpgsFYzVtL014iARzGRG7S_3CUjYpIGNlQx0ruGtVl5KCAEOxAtb_ZQ

Also a “network-path reference” should be implied, in the current format you assume a scheme exists beforehand.

Changing the split to:

<?php
...
$string2 = preg_split( '/(:\/\/)|&#0*58;|&#x0*3a;/i', $string, 2 );
if ( isset($string2[1]) && ! preg_match('%/\?%', $string2[0]) ) {
  ...
  $string = $protocol . '//' . $string;
}
...

fixes this issue and is more compliant without breaking sensitization.

Attachments (1)

fix_41304_patch.diff (820 bytes) - added by bogdanpreda 11 months ago.
Fix for this issue made from the SVN rev. 40885

Download all attachments as: .zip

Change History (3)

@bogdanpreda
11 months ago

Fix for this issue made from the SVN rev. 40885

#1 @SergeyBiryukov
11 months ago

  • Component changed from General to Formatting

Previously: #21974

#2 @SergeyBiryukov
11 months ago

  • Summary changed from Bad protocol sensitization in KSES for URLs NOT RFC 3986 compliant to Bad protocol sanitization in KSES for URLs NOT RFC 3986 compliant
Note: See TracTickets for help on using tickets.