Make WordPress Core

Opened 10 years ago

Closed 10 years ago

Last modified 9 years ago

#31080 closed enhancement (fixed)

GUID should not always be escaped for url in feeds

Reported by: cheesedurger's profile CheeseDurger Owned by: stevenkword's profile stevenkword
Milestone: 4.2 Priority: normal
Severity: normal Version: 4.2
Component: Feeds Keywords: has-patch commit
Focuses: template Cc:

Description

Hello,
On the /wp-includes/feed-rss2.php template, it validates the GUID as a url before outputting it (through the_guid() which calls esc_url()). So far so good because in Wordpress, GUID are urls by default.

However, it should not always be the case.

Example : a website migrating from another CMS to Wordpress. Some people are already subscribed to the feed in the other CMS and are reading it through some feed reader. The GUIDs in the other CMS are phrases with spaces. Then migrating the website, Wordpress will strip the spaces from the GUIDs, the feed readers will interpret those items as new items, the people will see duplicate items.

This bug might also be hidden in other feed templates.

Attachments (2)

31080.diff (1.0 KB) - added by stevenkword 10 years ago.
31080.2.diff (1.2 KB) - added by stevenkword 10 years ago.

Download all attachments as: .zip

Change History (23)

#1 in reply to: ↑ description @stevenkword
10 years ago

  • Keywords 2nd-opinion needs-patch dev-feedback added

The RSS 2.0 spec supports non-URL GUIDs if the isPermaLink attribute is set as false (defaults to 'true'). The RSS 2.0 template included in core does both support this GUID field as well as specifies the isPermaLink attribute as false. So far so good.

Further investigation into the_guid method found in wp-includes/post-template.php shows that the method simply escapes the URL and prints the value. This seems incorrect to me as well.

I would like to propose removing the URL escaping from this function, as non-URL GUIDs are both valid and a common use case for non-WordPress to WordPress migrations. However, I would like to get a 2nd opinion to discuss how this change might break other functionality / use-cases.

Related: http://www.w3schools.com/rss/rss_tag_guid.asp

This ticket was mentioned in Slack in #core by stevenkword. View the logs.


10 years ago

@stevenkword
10 years ago

#3 @stevenkword
10 years ago

  • Keywords has-patch added; needs-patch removed
  • Milestone changed from Awaiting Review to 4.2
  • Owner set to stevenkword
  • Status changed from new to assigned

I've attached a patch to change the escaping method when rendering GUIDs. 'the_guid()' is the only effected function which is called 4 times among the following 4 files: wp-admin/includes/export.php, wp-includes/feed-atom-comments.php, wp-includes/feed-atom.php, and wp-includes/feed-rss2.php.

The patch replaces the existing esc_url() method with esc_html(). This is done because according to the RSS 2.0 specification there is no syntax specified for this element other than to escape HTML entities. Spaces, for example, are perfectly valid and could exist for content created outside of a WordPress environment. As mentioned by the reporter (thanks CheeseDurger), this could cause a problem for migrated instances and should be addressed.

Last edited 10 years ago by stevenkword (previous) (diff)

This ticket was mentioned in Slack in #core by stevenkword. View the logs.


10 years ago

This ticket was mentioned in Slack in #core by drew. View the logs.


10 years ago

#6 @DrewAPicture
10 years ago

  • Keywords close added; dev-feedback removed

Core generates guids as URLs. I would say when outputting guids in the context of core, there's a case to be made for honoring core's intent. See the above-linked Slack log for discussion. Suggest wontfix.

#7 @rmccue
10 years ago

I disagree with close; core generates URLs by default, but there's no reason a plugin can't change this. It can actually be intentional to change this, as otherwise when migrating domains, you might be left with GUIDs containing the old domain.

I've seen sites change these to URNs instead (which would fail validation on esc_url right now I think, since presumably we don't allow the urn protocol). I think we should continue to support this.

#8 follow-up: @rmccue
10 years ago

Also: we're outputting into a field that's explicitly noted as "just a string" in the specification. The escaping functions are dependent on their context in the output, not on the input values, so all we should do is XML-escape regardless.

#9 in reply to: ↑ 8 @stevenkword
10 years ago

Replying to rmccue:

Also: we're outputting into a field that's explicitly noted as "just a string" in the specification. The escaping functions are dependent on their context in the output, not on the input values, so all we should do is XML-escape regardless.

I agree. Relates to #19998

This ticket was mentioned in Slack in #core by drew. View the logs.


10 years ago

#11 follow-up: @helen
10 years ago

  • Keywords close removed

In lieu of changing this outright for everybody, and the potential consequences that lie within, what if we switch to hooking esc_url() in? This would allow for those who want to use the GUID for something other than a URL to unhook and hook in something more appropriate like esc_html().

#12 in reply to: ↑ 11 @stevenkword
10 years ago

  • Keywords needs-patch added; has-patch removed

Replying to helen:

In lieu of changing this outright for everybody, and the potential consequences that lie within, what if we switch to hooking esc_url() in? This would allow for those who want to use the GUID for something other than a URL to unhook and hook in something more appropriate like esc_html().

@Helen, I like this approach. I'll submit a patch.

#13 @DrewAPicture
10 years ago

  • Keywords 2nd-opinion removed
  • Type changed from defect (bug) to enhancement

@stevenkword
10 years ago

#14 @stevenkword
10 years ago

  • Keywords has-patch 2nd-opinion added; needs-patch removed

Patch 31080.2.diff adds the filter the_guid to allow for plugins to modify the escaping method when displaying in feeds.

Usage example:

remove_filter( 'the_guid', 'esc_url' );

function html_escape_the_guid( $guid ) {
	return esc_html( get_the_guid () );
}
add_filter( 'the_guid', 'html_escape_the_guid' );
Version 0, edited 10 years ago by stevenkword (next)

#15 follow-up: @DrewAPicture
10 years ago

@helen: Was 31080.2.diff what you had in mind? Need to decide if this is ready or needs more iteration, and therefore to be pushed to a future release.

#16 in reply to: ↑ 15 @jorbin
10 years ago

  • Keywords commit added; 2nd-opinion removed

Replying to DrewAPicture:

@helen: Was 31080.2.diff what you had in mind? Need to decide if this is ready or needs more iteration, and therefore to be pushed to a future release.

I'm not Helen, but I think this solution is much much better and I think it's ready to go in.

#17 @helen
10 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 31726:

Enable more flexibility for non-URL GUIDs.

props stevenkword.
fixes #31080.

#18 @nacin
9 years ago

I really like [31726] and I bet @alexkingorg does too.

#20 @nacin
9 years ago

I'll just mention that the esc_url() here was absolutely added in an early 3.x security release, to close a vulnerability where (this is from memory) it was possible to set the guid for a post via a specially crafted request and then end up with an XSS situation.

#21 @nacin
9 years ago

I'll also add this was done as defense in depth. As in, we now sanitize GUIDs on save, and also added escaping in case it was exploited before update. In that case, we could hypothetically remove the escaping years after the fact. But that wouldn't actually fix anything, as the GUIDs would still go through esc_url_raw() on save, and might in certain contexts still need to go through esc_url() on display. This is because they are often used directly as URLs for historical reasons.

This change at least allows for someone to unhook all of it, both save and display.

Note: See TracTickets for help on using tickets.