WordPress.org

Make WordPress Core

#20999 closed defect (bug) (fixed)

WordPress 3.4 doesn't handle rss url with percent code correctly

Reported by: herophuong Owned by: rmccue
Milestone: 3.5 Priority: normal
Severity: normal Version: 3.4
Component: External Libraries Keywords: dev-feedback has-patch
Focuses: Cc:

Description

I have a rss widget with this feed: http://news.google.com/news?gl=vn&pz=1&cf=all&ned=vi_vn&hl=vi&q=ch%E1%BB%91ng+h%C3%A0ng+gi%E1%BA%A3&output=rss
The rss handler makes all the percent code into question symbols (represent unknown characters). These characters also have differnt percent codes: http://news.google.com/news?gl=vn&pz=1&ned=vi_vn&hl=vi&q=ch%EF%BF%BD%EF%BF%BD%EF%BF%BDng+h%EF%BF%BD%EF%BF%BDng+gi%EF%BF%BD%EF%BF%BD%EF%BF%BD
Keep the rss url in unicode format doesn't help.
The bug only appears in 3.4 version.

Attachments (1)

20999.patch (930 bytes) - added by kurtpayne 22 months ago.
IRI patch from SimplePie on github @ 90a771f219

Download all attachments as: .zip

Change History (21)

comment:1 kurtpayne22 months ago

  • Cc kpayne@… added
  • Keywords reporter-feedback added
  • Severity changed from major to normal

I created a new RSS feed widget on my site and pasted in this URL:

https://news.google.com/news/feeds?gl=vn&pz=1&cf=all&ned=vi_vn&hl=vi&q=ch%E1%BB%91ng+h%C3%A0ng+gi%E1%BA%A3&output=rss

The widget showed up on my site with content. Also, the latest changes to esc_url were in [18469] for 3.2.1.

If you delete and recreate the widget, does this solve the problem?

comment:2 herophuong22 months ago

It still shows up with content because both of the link still have result from google news. Can you recheck it again and see that the content showed up in your site is the same as the original link's XML output or the error link's XML output?

I have deleted and recreated the widget many times and even installed third-party rss widget but may be because they all use wordpress internal function for parsing rss feed, the results are the same with the wrong content.

comment:3 follow-up: kurtpayne22 months ago

  • Component changed from Feeds to External Libraries
  • Keywords dev-feedback added; reporter-feedback removed

Thanks for the extra information. I can reproduce this with the code below:

$url = 'https://news.google.com/news/feeds?gl=vn&pz=1&cf=all&ned=vi_vn&hl=vi&q=ch%E1%BB%91ng+h%C3%A0ng+gi%E1%BA%A3&output=rss';
var_dump($url);

// outputs: 
// string 'https://news.google.com/news/feeds?gl=vn&pz=1&cf=all&ned=vi_vn&hl=vi&q=ch%E1%BB%91ng+h%C3%A0ng+gi%E1%BA%A3&output=rss' (length=117)

$iri =& new SimplePie_IRI($url);
var_dump($iri->get_iri());

// outputs: 
// string 'https://news.google.com/news/feeds?gl=vn&pz=1&cf=all&ned=vi_vn&hl=vi&q=chống+hàng+giả&output=rss' (length=101)

The problem is happening in SimplePie_IRI::replace_invalid_with_pct_encoding().

Related #18309

comment:4 nacin22 months ago

  • Milestone changed from Awaiting Review to 3.4.1

comment:5 SergeyBiryukov22 months ago

  • Summary changed from Wordpress 3.4 doesn't handle rss url with percent code correctly to WordPress 3.4 doesn't handle rss url with percent code correctly

comment:6 nacin22 months ago

This appears to be broken in simplepie master as well: https://raw.github.com/simplepie/simplepie/master/SimplePie/IRI.php

comment:7 follow-up: nacin22 months ago

For 1.2.0, I get:

string(143) "https://news.google.com/news/feeds?gl%3Dvn%26pz%3D1%26cf%3Dall%26ned%3Dvi_vn%26hl%3Dvi%26q%3Dch%E1%BB%91ng+h%C3%A0ng+gi%E1%BA%A3%26output%3Drss"

Not sure if that is right or not. Seems like it worked in 3.3, so I imagine yes.

Per rmccue, relevant issue: https://github.com/simplepie/simplepie/issues/112.

comment:8 in reply to: ↑ 7 rmccue22 months ago

  • Owner set to rmccue
  • Status changed from new to assigned

Replying to nacin:

For 1.2.0, I get:

string(143) "https://news.google.com/news/feeds?gl%3Dvn%26pz%3D1%26cf%3Dall%26ned%3Dvi_vn%26hl%3Dvi%26q%3Dch%E1%BB%91ng+h%C3%A0ng+gi%E1%BA%A3%26output%3Drss"

Not sure if that is right or not. Seems like it worked in 3.3, so I imagine yes.

Nope, that's incorrect. Note how the & and = in the query string have been encoded. That was issue 112, which was fixed. This appears to be a new issue, but related.

This appears to be related to https://github.com/simplepie/simplepie/issues/162 and possibly https://github.com/simplepie/simplepie/issues/58

From the issue description:

http://news.google.com/news?gl=vn&pz=1&cf=all&ned=vi_vn&hl=vi&q=ch%E1%BB%91ng+h%C3%A0ng+gi%E1%BA%A3&output=rss
http://news.google.com/news?gl=vn&pz=1&ned=vi_vn&hl=vi&q=ch%EF%BF%BD%EF%BF%BD%EF%BF%BDng+h%EF%BF%BD%EF%BF%BDng+gi%EF%BF%BD%EF%BF%BD%EF%BF%BD

Note that &cf=all and &output=rss have disappeared. All the characters in the original URL have been turned into a � character (U+FFFD, aka 0xEF 0xBF 0xBD). That would indicate to me a possible issue with iconv or mbstring on the version of PHP on the server. I'll have a dig into this when I've got stable internet.

comment:9 in reply to: ↑ 3 rmccue22 months ago

Replying to kurtpayne:

Thanks for the extra information. I can reproduce this with the code below:

$url = 'https://news.google.com/news/feeds?gl=vn&pz=1&cf=all&ned=vi_vn&hl=vi&q=ch%E1%BB%91ng+h%C3%A0ng+gi%E1%BA%A3&output=rss';
var_dump($url);

// outputs: 
// string 'https://news.google.com/news/feeds?gl=vn&pz=1&cf=all&ned=vi_vn&hl=vi&q=ch%E1%BB%91ng+h%C3%A0ng+gi%E1%BA%A3&output=rss' (length=117)

$iri =& new SimplePie_IRI($url);
var_dump($iri->get_iri());

// outputs: 
// string 'https://news.google.com/news/feeds?gl=vn&pz=1&cf=all&ned=vi_vn&hl=vi&q=chống+hàng+giả&output=rss' (length=101)

Note that this is not the same as the original issue, although it's likely that it's linked. SimplePie_IRI decodes some percent-encoded strings to try and form a canonical IRI. It appears in this case that it is doing so incorrectly. The original issue though has the characters replaced by U+FFFD.

My guess is that the same code is affecting both instances, but that the original issue is also occurring because of iconv/mbstring errors.

comment:10 rmccue22 months ago

The commit which broke this was 4e022de12 for 1.3-dev and 76b5fd63 for 1.2.1. This commit was a backport of the more up-to-date IRI-parsing code from the ComplexPie fork (possibly future SimplePie 2).

The issue is that SimplePie thinks the characters are valid for a URL when it checks. While they are valid UTF-8/UCS characters, they don't appear to be valid for URLs based on my reading of RFC3986. At the moment, I'm lacking the mental power to work out exactly why that test is there, so I'll have to come back to doing this later.

comment:11 nacin22 months ago

  • Milestone changed from 3.4.1 to 3.4.2

comment:12 rmccue22 months ago

This should now be fixed in https://github.com/simplepie/simplepie/commit/90a771f21985e01cb0133c89631debfa951f6171

The problem was due to backporting the incomplete updated IRI library. SimplePie_IRI::get_iri() always returned an IRI, which is as designed. The problem is that all the rest of the code pretends it returns a URI.

The above commit fixes that issue by breaking what get_iri() should do, in favour of making it work. Let me know if this fixes your issue (I believe it should).

kurtpayne22 months ago

IRI patch from SimplePie on github @ 90a771f219

comment:13 follow-up: kurtpayne22 months ago

Testing the user supplied URL -- RSS widget now displays the same content as visiting the URL directly.

Sample code from above also displays identical output for get_iri() and the original URL.

Also tested with the WordPress RSS feed URL -- passed.

Patch works.

comment:14 in reply to: ↑ 13 rmccue22 months ago

  • Keywords has-patch added

Replying to kurtpayne:

Patch works.

Thanks for confirming and patchifying. Marking as has-patch; this is now ready for commit.

comment:15 rmccue22 months ago

  • Keywords has-patch removed
  • Resolution set to invalid
  • Status changed from assigned to closed

Previous patch is obsoleted by #21183.

comment:16 dd3222 months ago

  • Resolution invalid deleted
  • Status changed from closed to reopened

I'd prefer to keep this open until fixed by #21183, we'll also need to decide if there'll be any alterations for 3.4.2 (If one is released) which the before mentioned ticket can't handle.

comment:17 rmccue22 months ago

  • Keywords has-patch added

Fair enough. In that case, re-adding has-patch.

comment:18 SergeyBiryukov21 months ago

  • Version changed from 3.4 to 3.4.1

20999.patch works for me as well.

comment:19 SergeyBiryukov21 months ago

  • Version changed from 3.4.1 to 3.4

Didn't mean to change the version.

comment:20 nacin20 months ago

  • Milestone changed from 3.4.2 to 3.5
  • Resolution set to fixed
  • Status changed from reopened to closed

Fixed by #21183, [21644]. Closing for 3.5.

Note: See TracTickets for help on using tickets.