Make WordPress Core

Opened 8 years ago

Closed 6 years ago

#3727 closed defect (bug) (fixed)

WP->parse_request() won't replace $pathinfo when $req_uri contains any %## encoding character.

Reported by: Kirin_Lin Owned by: ryan
Milestone: 2.8 Priority: normal
Severity: normal Version: 2.2
Component: I18N Keywords:
Focuses: Cc:


I write blog in Chinese. WordPress can't use index_permalinks which contains "%postname%". The problem is using slug not in English, which will be urlencoded to %## format.

In WP2.1, wp-includes/classes.php line 61 and in WP2.0, wp-includes/classes.php line 1528:

$req_uri = str_replace($pathinfo, '', $req_uri);

When I use Chinese slug, the $pathinfo will be like "2007/01/31/測試" and the $requri will be "/index.php/2007/01/31/%e6%b8%ac%e8%a9%a6/". That makes the function of above doesn't work.

It should be like this:

$req_uri = str_replace($pathinfo, '', urldecode($req_uri));

Attachments (2)

classes-2.1.diff (623 bytes) - added by Kirin_Lin 8 years ago.
diff file for 2.1
classes-2.0.diff (628 bytes) - added by Kirin_Lin 8 years ago.
diff file for 2.0

Download all attachments as: .zip

Change History (46)

@Kirin_Lin8 years ago

diff file for 2.1

comment:1 @Kirin_Lin8 years ago

  • Keywords rewrite permalink added
  • Owner changed from anonymous to ryan

@Kirin_Lin8 years ago

diff file for 2.0

comment:2 @Kirin_Lin8 years ago

  • Cc markjaquith added

comment:3 @ryan8 years ago

What http server are you using and what version of that server? We need to see if different servers and server versions behave consistently before putting in a fix.

comment:4 @Kirin_Lin8 years ago

I tested it in two servers:
Apache/2.2.3 (Win32),PHP/5.2.0
Apache/2.0.58 (Unix), PHP/4.4.2

comment:5 @Kirin_Lin8 years ago

I asked for others help. There is another server enviornment.
Result: ok - Apache/1.3.37 (Unix), PHP/4.4.4

comment:6 @Kirin_Lin8 years ago

  • Resolution set to duplicate
  • Status changed from new to closed

I think this is duplicated with ticket:1144.

comment:7 @foolswisdom8 years ago

  • Milestone 2.1.1 deleted

comment:8 @foolswisdom8 years ago

  • Keywords has-patch added
  • Milestone set to 2.2
  • Resolution duplicate deleted
  • Status changed from closed to reopened

Circular duplicates ;-)

Reopening this with the more recent patch.

comment:9 @ryan8 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

(In [4893]) decode req uri when processing pathinfo permalinks. Props Kirin_Lin. fixes #3727

comment:10 @ryan8 years ago

  • Milestone changed from 2.2 to 2.1.2
  • Resolution fixed deleted
  • Status changed from closed to reopened

Fixed for 2.2. Let's try it out there for awhile and then backport to 2.1 and 2.0. Setting the milestone to 2.1.2 for now.

comment:11 @majelbstoat8 years ago

The recent addition of urldecode() also converts '+' to ' '. This breaks plugins that have added custom permalink handling, that use '+' to separate arguments. The specific case that broke for me is language appending in Gengo, which creates permalinks like site.com/en+de/ . This gets decoded to site.com/en de/ which fails to match the regular expressions that have been set up. I'm sure there must be other plugins out there which rely on this behaviour as well, as '+' is a very natural separator for specifying 'more than one'.

I would request that rawurldecode() be used instead, which is identical to urldecode(), except that it preserves '+'.

comment:12 @ryan8 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

(In [4979]) Use rawurldecode. fixes #3727

comment:13 @ryan8 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Changed to rawurldecode in 2.2.

comment:14 @foolswisdom8 years ago

  • Milestone changed from 2.1.4 to 2.2

comment:15 @ryan8 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

Not going into 2.1 and already in 2.1. Resolving.

comment:16 @rob1n8 years ago

Already in 2.2, I believe Ryan means.

comment:17 @hakre8 years ago

  • Resolution fixed deleted
  • Severity changed from normal to critical
  • Status changed from closed to reopened
  • Version changed from 2.1 to 2.2

I reopened this, becasue this Fix creates new Defects in the Rewrite Rule parsing engine. It transposes "%26"s into "&"s which are then passed to query string parser and will create a new query string values.

Can confirm that this is in 2.2 now.

comment:18 @hakre8 years ago

In a deeper analysis I realized that Kirin_Lin simply "changed something that affected something the way she/he wanted" and was fine was that but without even thinking about where the defect is originated in. This leads to an enourmous space for side-effects and esoteric behaviour of the rewrite part. Rawurldecode should only be applied on the query-info part of a querystring. so this has to be done on another place in the code-logic not in the part where decoded url-information has to stay inside the URL as within that line of code.

You can compare the applied fix as if the apache developers would pass a general rawurldecode() over the complete request URI.

comment:19 @hakre8 years ago

And it's even pittier to see that this kind of problem has already been seen by majelbstoat but his assumption still has no proper analysis. it just changes from urldecode to rawurldecode for one little "+". What is about all the other chars that plugins and article-names are bound to that need to be urlencoded in the url and then be passed urlencoded to the query-info-part to no being splitted?

comment:20 @hakre8 years ago

  • Severity changed from critical to blocker

comment:21 @hgurol8 years ago

  • Milestone changed from 2.2 to 2.2.1

comment:22 @hakre8 years ago

Warning: This can be used as an attack vector on Wordpress Blogs having pretty URLs enabled.

comment:23 @rob1n8 years ago

  • Milestone changed from 2.2.1 to 2.2.2

comment:24 @rob1n8 years ago

  • Keywords has-patch removed

comment:25 follow-up: @wordpressfun8 years ago

google indexed my wordpress so:

for example:

www.travel-carhire.com/online/golf/853.php/page/2/ ( its wrong url) or www.travel-carhire.com/de/2007/02/page/2/ ( its wrong url to)

you can see more url at www.google.de/search?hl=de&q=site%3Awww.travel-carhire.com+page%2F2&meta=

also seen: page/2/ is at end

also: */online/golf/853.php/* come between wwww.xxx.com/..AND../paged/2/

it is realy very carzzy....!!!! and not good for google search...

How can I delete */page/* from end of www.xxx.com/page/ like /page/2, page/3, page/4,.....

also i dont want to have */page/* at end of URL

I mean if I clik at* << Previous Entries .. Next Entries >>

It dont me help if I delete only << Previous Entries .. Next Entries >>

also I muss delete /page/ from URL. and google mussnt see/find */page/* at my Homepage....!!!

can anybody help me pls?? how can i delete */page/* from Url.

very thanks

comment:26 in reply to: ↑ 25 @Nazgul8 years ago

Replying to wordpressfun:
Could you please stop posting this all over trac?

comment:27 @Kirin_Lin8 years ago

Replying to hakre:

I reopened this, becasue this Fix creates new Defects in the Rewrite Rule parsing engine. It transposes "%26"s into "&"s which are then passed to query string parser and will create a new query string values.

Can confirm that this is in 2.2 now.

I realized what hakre said, so I rethink the problem and how to solve it. When we use urldecode(or rawueldecode) function to convert URI, it may cause security problem. In WordPress, the post title will be sanitized. I notice normal utf8 characters will be convert by utf8_uri_encode function.
Since we have risk by using decoding function, why don't we convert $pathinfo by using encoding function?
Thus, I try to use these code to test:

	if( seems_utf8($pathinfo) )
		$pathinfo = utf8_uri_encode($pathinfo);
	$req_uri = str_replace($pathinfo, '', $req_uri);


	$pathinfo = utf8_uri_encode($pathinfo);

Now, posts seem fine, but pages are broken. Because the rewrite rules are decoded utf8 from changeset:1841 when generating page index uri.
I undo the changeset and flush rules again, then pages come back.
Posts and Pages are fine now. I will do more test before i submit another patch.

comment:28 @hakre8 years ago

Hi Kirin_Lin, do you know why $pathinfo contains other characters then the allowed ones for an URI (http://gbiv.com/protocols/uri/rfc/rfc3986.html#characters) ?

comment:29 @Kirin_Lin8 years ago

Because the $PATH_INFO is implemented by following CGI/1.1 specification(http://hoohoo.ncsa.uiuc.edu/cgi/env.html), that says:

The extra path information, as given by the client.
In other words, scripts can be accessed by their virtual pathname,
followed by extra information at the end of this path.
The extra information is sent as PATH_INFO.
This information should be decoded by the server if it comes from 
a URL before it is passed to the CGI script.

That's why...

comment:30 @hakre8 years ago

  • Priority changed from normal to high

Well then the question is at this point of the script if $req_uri mimics the client (as of for Wordpress Rewrite Rule parsing engine) or if it mimics the server (as of CGI that should decode PATH_INFO).

according to the sideeffect you are reporting (changeset:1841) I would say that the core wordpress development team should make clear and document which behaviour has to be implemented at this point of sourcecode: client or server. this affects the rewrite part as well. Until this has not been made clear any made and further changes will produce side-effects and might even create crititcal attack-vectors on wordpress.

I would tend to say, that at this point req_uri should be encoded (not decoded) but I'm not a core wordpress developer so this is more a personal opinion. I raise the priority to high to gather some more awarness.

comment:31 @foolswisdom8 years ago

  • Milestone changed from 2.2.2 to 2.2.3

comment:32 @ryan7 years ago

  • Milestone changed from 2.2.3 to 2.4

comment:33 @lloydbudd7 years ago

  • Milestone changed from 2.5 to 2.6

comment:34 @hakre7 years ago

Only authoritative from a core developer is missing here. I could write a patch afterwards. Can a Developer please make a significant, written statement on this function / usage?

comment:35 @hakre7 years ago

  • Cc hakre added
  • Priority changed from high to highest omg bbq

comment:36 @hakre7 years ago

  • Keywords dev-feedback added

comment:37 @jacobsantos6 years ago

  • Component changed from General to i18n

comment:38 @hakre6 years ago

Still waiting for a Core Developer Statement here on the Usage of the Function. How can I help?

comment:39 @Denis-de-Bernardy6 years ago

  • Keywords needs-patch added; rewrite permalink removed
  • Milestone changed from 2.9 to 2.8

still current I take it?

comment:40 @Denis-de-Bernardy6 years ago

probably related: #8446

comment:41 @Denis-de-Bernardy6 years ago

  • Priority changed from highest omg bbq to normal
  • Severity changed from blocker to normal

comment:42 in reply to: ↑ description @hakre6 years ago

looks like "index_permalinks" ain't a function in wp 2.8 any longer. looks more and more bogus to me.

comment:43 @Denis-de-Bernardy6 years ago

Yeah, this has been moved in the parse_request() method in wp-includes/classes.php.

Might be bogus by now, I've essentially no means to constructively test any of it.

comment:44 @hakre6 years ago

  • Keywords needs-patch dev-feedback removed
  • Resolution set to fixed
  • Status changed from reopened to closed

i've taken a look through WP::parse_request(); and I must assume that it should handle IRI-Encoded URLs now properly. I do not fully understand why this all did not work with %postname% that time but with the current 2.7 / 2.8 version, %postname% is properly used in URLs even with UTF-8. I checked that with the UTF-8 css classnames ticket #8446.

WP->parse_request() is currently using rawurldecode() and is using it only on the path of the URI, not the query-info-part. That looks like a safe way of doing the IRI-Decoding to me.

This is fixed, %postname% can be used with utf-8 characters beyond 7bit ascii.

Note: See TracTickets for help on using tickets.