#31080 closed enhancement (fixed)
GUID should not always be escaped for url in feeds
Reported by: | CheeseDurger | Owned by: | stevenkword |
---|---|---|---|
Milestone: | 4.2 | Priority: | normal |
Severity: | normal | Version: | 4.2 |
Component: | Feeds | Keywords: | has-patch commit |
Focuses: | template | Cc: |
Description
Hello,
On the /wp-includes/feed-rss2.php template, it validates the GUID as a url before outputting it (through the_guid() which calls esc_url()). So far so good because in Wordpress, GUID are urls by default.
However, it should not always be the case.
Example : a website migrating from another CMS to Wordpress. Some people are already subscribed to the feed in the other CMS and are reading it through some feed reader. The GUIDs in the other CMS are phrases with spaces. Then migrating the website, Wordpress will strip the spaces from the GUIDs, the feed readers will interpret those items as new items, the people will see duplicate items.
This bug might also be hidden in other feed templates.
Attachments (2)
Change History (23)
#1
in reply to:
↑ description
@
10 years ago
- Keywords 2nd-opinion needs-patch dev-feedback added
This ticket was mentioned in Slack in #core by stevenkword. View the logs.
10 years ago
#3
@
10 years ago
- Keywords has-patch added; needs-patch removed
- Milestone changed from Awaiting Review to 4.2
- Owner set to stevenkword
- Status changed from new to assigned
I've attached a patch to change the escaping method when rendering GUIDs. 'the_guid()' is the only effected function which is called 4 times among the following 4 files: wp-admin/includes/export.php
, wp-includes/feed-atom-comments.php
, wp-includes/feed-atom.php
, and wp-includes/feed-rss2.php
.
The patch replaces the existing esc_url()
method with esc_html()
. This is done because according to the RSS 2.0 specification there is no syntax specified for this element other than to escape HTML entities. Spaces, for example, are perfectly valid and could exist for content created outside of a WordPress environment. As mentioned by the reporter (thanks CheeseDurger), this could cause a problem for migrated instances and should be addressed.
This ticket was mentioned in Slack in #core by stevenkword. View the logs.
10 years ago
This ticket was mentioned in Slack in #core by drew. View the logs.
10 years ago
#6
@
10 years ago
- Keywords close added; dev-feedback removed
Core generates guids as URLs. I would say when outputting guids in the context of core, there's a case to be made for honoring core's intent. See the above-linked Slack log for discussion. Suggest wontfix.
#7
@
10 years ago
I disagree with close; core generates URLs by default, but there's no reason a plugin can't change this. It can actually be intentional to change this, as otherwise when migrating domains, you might be left with GUIDs containing the old domain.
I've seen sites change these to URNs instead (which would fail validation on esc_url
right now I think, since presumably we don't allow the urn
protocol). I think we should continue to support this.
#8
follow-up:
↓ 9
@
10 years ago
Also: we're outputting into a field that's explicitly noted as "just a string" in the specification. The escaping functions are dependent on their context in the output, not on the input values, so all we should do is XML-escape regardless.
This ticket was mentioned in Slack in #core by drew. View the logs.
10 years ago
#11
follow-up:
↓ 12
@
10 years ago
- Keywords close removed
In lieu of changing this outright for everybody, and the potential consequences that lie within, what if we switch to hooking esc_url()
in? This would allow for those who want to use the GUID for something other than a URL to unhook and hook in something more appropriate like esc_html()
.
#12
in reply to:
↑ 11
@
10 years ago
- Keywords needs-patch added; has-patch removed
Replying to helen:
In lieu of changing this outright for everybody, and the potential consequences that lie within, what if we switch to hooking
esc_url()
in? This would allow for those who want to use the GUID for something other than a URL to unhook and hook in something more appropriate likeesc_html()
.
@Helen, I like this approach. I'll submit a patch.
#14
@
10 years ago
- Keywords has-patch 2nd-opinion added; needs-patch removed
Patch 31080.2.diff adds the filter the_guid
to allow for plugins to modify the escaping method when displaying in feeds.
Usage example:
remove_filter( 'the_guid', 'esc_url' ); function html_escape_the_guid( $guid ) { return esc_html( get_the_guid () ); } add_filter( 'the_guid', 'html_escape_the_guid' );
#15
follow-up:
↓ 16
@
10 years ago
@helen: Was 31080.2.diff what you had in mind? Need to decide if this is ready or needs more iteration, and therefore to be pushed to a future release.
#16
in reply to:
↑ 15
@
10 years ago
- Keywords commit added; 2nd-opinion removed
Replying to DrewAPicture:
@helen: Was 31080.2.diff what you had in mind? Need to decide if this is ready or needs more iteration, and therefore to be pushed to a future release.
I'm not Helen, but I think this solution is much much better and I think it's ready to go in.
#20
@
9 years ago
I'll just mention that the esc_url()
here was absolutely added in an early 3.x security release, to close a vulnerability where (this is from memory) it was possible to set the guid for a post via a specially crafted request and then end up with an XSS situation.
#21
@
9 years ago
I'll also add this was done as defense in depth. As in, we now sanitize GUIDs on save, and also added escaping in case it was exploited before update. In that case, we could hypothetically remove the escaping years after the fact. But that wouldn't actually fix anything, as the GUIDs would still go through esc_url_raw()
on save, and might in certain contexts still need to go through esc_url()
on display. This is because they are often used directly as URLs for historical reasons.
This change at least allows for someone to unhook all of it, both save and display.
The RSS 2.0 spec supports non-URL GUIDs if the
isPermaLink
attribute is set as false (defaults to 'true'). The RSS 2.0 template included in core does both support this GUID field as well as specifies the isPermaLink attribute as false. So far so good.Further investigation into
the_guid
method found inwp-includes/post-template.php
shows that the method simply escapes the URL and prints the value. This seems incorrect to me as well.I would like to propose removing the URL escaping from this function, as non-URL GUIDs are both valid and a common use case for non-WordPress to WordPress migrations. However, I would like to get a 2nd opinion to discuss how this change might break other functionality / use-cases.
Related: http://www.w3schools.com/rss/rss_tag_guid.asp