Opened 8 years ago
Last modified 8 years ago
#41304 new defect (bug)
Bad protocol sanitization in KSES for URLs NOT RFC 3986 compliant
Reported by: |
|
Owned by: | |
---|---|---|---|
Milestone: | Awaiting Review | Priority: | normal |
Severity: | normal | Version: | 4.8 |
Component: | Formatting | Keywords: | |
Focuses: | Cc: |
Description
For URL's that are passed through the kses sanitizer.
As specified in RFC 3986, Section 3.3
The path component contains data, usually organized in hierarchical form, that, along with data in the non-hierarchical query component (Section 3.4), serves to identify a resource within the scope of the URI's scheme and naming authority (if any). The path is terminated by the first question mark ("?") or number sign ("#") character, or by the end of the URI.
If a URI contains an authority component, then the path component must either be empty or begin with a slash ("/") character. If a URI does not contain an authority component, then the path cannot begin with two slash characters (""). In addition, a URI reference (Section 4.1) may be a relative-path reference, in which case the first path segment cannot contain a colon (":") character. The ABNF requires five separate rules to disambiguate these cases, only one of which will match the path substring within a given URI reference. We use the generic term "path component" to describe the URI substring matched by the parser to one of these rules.
So colon(':') is allowed inside URL's. When trying to split the URL like this:
<?php function wp_kses_bad_protocol_once($string, $allowed_protocols, $count = 1 ) { $string2 = preg_split( '/:|�*58;|�*3a;/i', $string, 2 ); ...
for URL's that do not contain a specified scheme and use colon (':') inside the URL this breaks and returns only the second part of the URL after the colon. Eg:
t0.gstatic.com/images?q=tbn:ANd9GcSxT2q6fV-59s5hq5a03fpgsFYzVtL014iARzGRG7S_3CUjYpIGNlQx0ruGtVl5KCAEOxAtb_ZQ
will return: ANd9GcSxT2q6fV-59s5hq5a03fpgsFYzVtL014iARzGRG7S_3CUjYpIGNlQx0ruGtVl5KCAEOxAtb_ZQ
Also a “network-path reference” should be implied, in the current format you assume a scheme exists beforehand.
Changing the split to:
<?php ... $string2 = preg_split( '/(:\/\/)|�*58;|�*3a;/i', $string, 2 ); if ( isset($string2[1]) && ! preg_match('%/\?%', $string2[0]) ) { ... $string = $protocol . '//' . $string; } ...
fixes this issue and is more compliant without breaking sensitization.
Fix for this issue made from the SVN rev. 40885