Opened 11 months ago

Closed 9 months ago

#20999 closed defect (bug) (fixed)

WordPress 3.4 doesn't handle rss url with percent code correctly

Reported by: herophuong Owned by: rmccue
Priority: normal Milestone: 3.5
Component: External Libraries Version: 3.4
Severity: normal Keywords: dev-feedback has-patch
Cc: herophuong, kpayne@…

Description

I have a rss widget with this feed: http://news.google.com/news?gl=vn&pz=1&cf=all&ned=vi_vn&hl=vi&q=ch%E1%BB%91ng+h%C3%A0ng+gi%E1%BA%A3&output=rss
The rss handler makes all the percent code into question symbols (represent unknown characters). These characters also have differnt percent codes: http://news.google.com/news?gl=vn&pz=1&ned=vi_vn&hl=vi&q=ch%EF%BF%BD%EF%BF%BD%EF%BF%BDng+h%EF%BF%BD%EF%BF%BDng+gi%EF%BF%BD%EF%BF%BD%EF%BF%BD
Keep the rss url in unicode format doesn't help.
The bug only appears in 3.4 version.

Attachments (1)

20999.patch (930 bytes) - added by kurtpayne 11 months ago.
IRI patch from SimplePie on github @ 90a771f219

Download all attachments as: .zip

Change History (21)

  • Cc kpayne@… added
  • Keywords reporter-feedback added
  • Severity changed from major to normal

I created a new RSS feed widget on my site and pasted in this URL:

https://news.google.com/news/feeds?gl=vn&pz=1&cf=all&ned=vi_vn&hl=vi&q=ch%E1%BB%91ng+h%C3%A0ng+gi%E1%BA%A3&output=rss

The widget showed up on my site with content. Also, the latest changes to esc_url were in [18469] for 3.2.1.

If you delete and recreate the widget, does this solve the problem?

It still shows up with content because both of the link still have result from google news. Can you recheck it again and see that the content showed up in your site is the same as the original link's XML output or the error link's XML output?

I have deleted and recreated the widget many times and even installed third-party rss widget but may be because they all use wordpress internal function for parsing rss feed, the results are the same with the wrong content.

comment:3 follow-up: ↓ 9   kurtpayne11 months ago

  • Component changed from Feeds to External Libraries
  • Keywords dev-feedback added; reporter-feedback removed

Thanks for the extra information. I can reproduce this with the code below:

$url = 'https://news.google.com/news/feeds?gl=vn&pz=1&cf=all&ned=vi_vn&hl=vi&q=ch%E1%BB%91ng+h%C3%A0ng+gi%E1%BA%A3&output=rss';
var_dump($url);

// outputs: 
// string 'https://news.google.com/news/feeds?gl=vn&pz=1&cf=all&ned=vi_vn&hl=vi&q=ch%E1%BB%91ng+h%C3%A0ng+gi%E1%BA%A3&output=rss' (length=117)

$iri =& new SimplePie_IRI($url);
var_dump($iri->get_iri());

// outputs: 
// string 'https://news.google.com/news/feeds?gl=vn&pz=1&cf=all&ned=vi_vn&hl=vi&q=chống+hàng+giả&output=rss' (length=101)

The problem is happening in SimplePie_IRI::replace_invalid_with_pct_encoding().

Related #18309

  • Milestone changed from Awaiting Review to 3.4.1
  • Summary changed from Wordpress 3.4 doesn't handle rss url with percent code correctly to WordPress 3.4 doesn't handle rss url with percent code correctly

This appears to be broken in simplepie master as well: https://raw.github.com/simplepie/simplepie/master/SimplePie/IRI.php

comment:7 follow-up: ↓ 8   nacin11 months ago

For 1.2.0, I get:

string(143) "https://news.google.com/news/feeds?gl%3Dvn%26pz%3D1%26cf%3Dall%26ned%3Dvi_vn%26hl%3Dvi%26q%3Dch%E1%BB%91ng+h%C3%A0ng+gi%E1%BA%A3%26output%3Drss"

Not sure if that is right or not. Seems like it worked in 3.3, so I imagine yes.

Per rmccue, relevant issue: https://github.com/simplepie/simplepie/issues/112.

comment:8 in reply to: ↑ 7   rmccue11 months ago

  • Owner set to rmccue
  • Status changed from new to assigned

Replying to nacin:

For 1.2.0, I get:

string(143) "https://news.google.com/news/feeds?gl%3Dvn%26pz%3D1%26cf%3Dall%26ned%3Dvi_vn%26hl%3Dvi%26q%3Dch%E1%BB%91ng+h%C3%A0ng+gi%E1%BA%A3%26output%3Drss"

Not sure if that is right or not. Seems like it worked in 3.3, so I imagine yes.

Nope, that's incorrect. Note how the & and = in the query string have been encoded. That was issue 112, which was fixed. This appears to be a new issue, but related.

This appears to be related to https://github.com/simplepie/simplepie/issues/162 and possibly https://github.com/simplepie/simplepie/issues/58

From the issue description:

http://news.google.com/news?gl=vn&pz=1&cf=all&ned=vi_vn&hl=vi&q=ch%E1%BB%91ng+h%C3%A0ng+gi%E1%BA%A3&output=rss
http://news.google.com/news?gl=vn&pz=1&ned=vi_vn&hl=vi&q=ch%EF%BF%BD%EF%BF%BD%EF%BF%BDng+h%EF%BF%BD%EF%BF%BDng+gi%EF%BF%BD%EF%BF%BD%EF%BF%BD

Note that &cf=all and &output=rss have disappeared. All the characters in the original URL have been turned into a � character (U+FFFD, aka 0xEF 0xBF 0xBD). That would indicate to me a possible issue with iconv or mbstring on the version of PHP on the server. I'll have a dig into this when I've got stable internet.

comment:9 in reply to: ↑ 3   rmccue11 months ago

Replying to kurtpayne:

Thanks for the extra information. I can reproduce this with the code below:

$url = 'https://news.google.com/news/feeds?gl=vn&pz=1&cf=all&ned=vi_vn&hl=vi&q=ch%E1%BB%91ng+h%C3%A0ng+gi%E1%BA%A3&output=rss';
var_dump($url);

// outputs: 
// string 'https://news.google.com/news/feeds?gl=vn&pz=1&cf=all&ned=vi_vn&hl=vi&q=ch%E1%BB%91ng+h%C3%A0ng+gi%E1%BA%A3&output=rss' (length=117)

$iri =& new SimplePie_IRI($url);
var_dump($iri->get_iri());

// outputs: 
// string 'https://news.google.com/news/feeds?gl=vn&pz=1&cf=all&ned=vi_vn&hl=vi&q=chống+hàng+giả&output=rss' (length=101)

Note that this is not the same as the original issue, although it's likely that it's linked. SimplePie_IRI decodes some percent-encoded strings to try and form a canonical IRI. It appears in this case that it is doing so incorrectly. The original issue though has the characters replaced by U+FFFD.

My guess is that the same code is affecting both instances, but that the original issue is also occurring because of iconv/mbstring errors.

The commit which broke this was 4e022de12 for 1.3-dev and 76b5fd63 for 1.2.1. This commit was a backport of the more up-to-date IRI-parsing code from the ComplexPie fork (possibly future SimplePie 2).

The issue is that SimplePie thinks the characters are valid for a URL when it checks. While they are valid UTF-8/UCS characters, they don't appear to be valid for URLs based on my reading of RFC3986. At the moment, I'm lacking the mental power to work out exactly why that test is there, so I'll have to come back to doing this later.

  • Milestone changed from 3.4.1 to 3.4.2

This should now be fixed in https://github.com/simplepie/simplepie/commit/90a771f21985e01cb0133c89631debfa951f6171

The problem was due to backporting the incomplete updated IRI library. SimplePie_IRI::get_iri() always returned an IRI, which is as designed. The problem is that all the rest of the code pretends it returns a URI.

The above commit fixes that issue by breaking what get_iri() should do, in favour of making it work. Let me know if this fixes your issue (I believe it should).

IRI patch from SimplePie on github @ 90a771f219

comment:13 follow-up: ↓ 14   kurtpayne11 months ago

Testing the user supplied URL -- RSS widget now displays the same content as visiting the URL directly.

Sample code from above also displays identical output for get_iri() and the original URL.

Also tested with the WordPress RSS feed URL -- passed.

Patch works.

comment:14 in reply to: ↑ 13   rmccue11 months ago

  • Keywords has-patch added

Replying to kurtpayne:

Patch works.

Thanks for confirming and patchifying. Marking as has-patch; this is now ready for commit.

  • Keywords has-patch removed
  • Resolution set to invalid
  • Status changed from assigned to closed

Previous patch is obsoleted by #21183.

  • Resolution invalid deleted
  • Status changed from closed to reopened

I'd prefer to keep this open until fixed by #21183, we'll also need to decide if there'll be any alterations for 3.4.2 (If one is released) which the before mentioned ticket can't handle.

  • Keywords has-patch added

Fair enough. In that case, re-adding has-patch.

  • Version changed from 3.4 to 3.4.1

20999.patch works for me as well.

  • Version changed from 3.4.1 to 3.4

Didn't mean to change the version.

  • Milestone changed from 3.4.2 to 3.5
  • Resolution set to fixed
  • Status changed from reopened to closed

Fixed by #21183, [21644]. Closing for 3.5.

Note: See TracTickets for help on using tickets.