Make WordPress Core

Opened 15 years ago

Closed 15 years ago

Last modified 15 years ago

#8098 closed defect (bug) (fixed)

trailing p's removed from query by canonical redirect

Reported by: dd32's profile DD32 Owned by: markjaquith's profile markjaquith
Milestone: 2.7 Priority: normal
Severity: normal Version: 2.7
Component: Permalinks Keywords: has-patch 2nd-opinion
Focuses: Cc:

Description

It appears to just be an unescaped ?, see attached patch

Affected URL's are of the format:

  • ?tag=php redirects to ?tag=ph

Attachments (4)

8098.diff (626 bytes) - added by DD32 15 years ago.
8098.2.diff (628 bytes) - added by DD32 15 years ago.
8098.3.diff (646 bytes) - added by stevish 15 years ago.
Removes any empty p=, tag=, cat=, or page_id=, without removing any combination of letters directly to the right of an '='
8098.4.diff (618 bytes) - added by fitztrev 15 years ago.
Moving the = sign to be part of each query var name seems to do the trick, too.

Download all attachments as: .zip

Change History (26)

#1 @DD32
15 years ago

  • Keywords has-patch removed

The problem is caused by
http://trac.wordpress.org/browser/trunk/wp-includes/canonical.php#L207

$redirect['query'] = preg_replace( '#&?(p|page_id|cat|tag)=?$#', '', $redirect['query'] );

I was thinking the first ? was unescaped, but on closer inspection, It appears my logic was wrong, The start of the expression needs to be tied to either the start of the string, or a &, And i'm not quite sure how to achieve that

#2 @stevish
15 years ago

It should be '#&?(p|page_id|cat|tag)=?$#'

@DD32
15 years ago

@DD32
15 years ago

#3 @stevish
15 years ago

bummer, I should have used preview... It should be

'#&?(p|page_id|cat|tag)=?$#'

#4 @stevish
15 years ago

ARRRG

'#^&?(p|page_id|cat|tag)=?$#'

#5 @DD32
15 years ago

attachment 8098.2.diff added.

  • Do not replace it if it is followed by a word character
    • 'p=' =>
    • 'tag=php&p=' => 'tag=php'
    • 'tag=php' => 'tag=php'
    • 'p=&tag=php' => 'p=&tag=php' (This case didnt work before either)

'#^&?(p|page_id|cat|tag)=?$#'
That wouldnt work either, that requires that the param be the first on the line, whereas i believe its supposed to cleanup trailing items too

#6 @stevish
15 years ago

I've also tested the following... it's simple and to the point:

'#(^p|^page_id|^cat|^tag|&p|&page_id|&cat|&tag)=?$#' ? Would that work?

I've tried it on a regex tester, and it seems to work...

Yeah, I caught that issue on my last idea too

#7 follow-up: @DD32
15 years ago

Another option instead of using a Negitive Assertion would be to use a word boundary:

$redirect['query'] = preg_replace( '#\b&?(p|page_id|cat|tag)=?$#', '', $redirect['query'] );

which achieves the same result (Needs to start with a non-word character), I'm not sure which'd be faster, I assume the word boundary would be.

'#(^p|^page_id|^cat|^tag|&p|&page_id|&cat|&tag)=?$#'

Hm, That could work, looks a bit ugly though (but what regex doesnt?)

#8 in reply to: ↑ 7 @azaozz
15 years ago

Replying to DD32:

... I'm not sure which'd be faster, I assume the word boundary would be.

This is usually a very short string, so speed is not that important.

'#(^p|^page_id|^cat|^tag|&p|&page_id|&cat|&tag)=?$#'

Hm, That could work, looks a bit ugly though (but what regex doesnt?)

I think this can also be written as:

'#(^|&)(p|page_id|cat|tag)=?$#'

and if added at the end, it will also clean example 3 above, so 'p=&tag=php' => 'tag=php'

'#(^|&)(p|page_id|cat|tag)=?(&|$)#'

Of course this will have to be well tested first to make sure it doesn't break anything else.

#9 follow-up: @DD32
15 years ago

'#(^|&)(p|page_id|cat|tag)=?$#'

In my quick testing, Using in ()'s didnt appear to work, However, It does.

The only problem with that one, is it causes this:

In: flame=abc&p=&test=abs
Out:flame=abctest=abs

#10 in reply to: ↑ 9 @DD32
15 years ago

Replying to DD32:

Ok, That really didnt turn out very well :)
In my testing i found in ()'s didnt seem to work correctly.. But it does appear to..

Regex: '#(^|&)(p|page_id|cat|tag)=?(&|$)#'
In: flame=abc&p=&test=abs
Out:flame=abctest=abs

#11 follow-up: @azaozz
15 years ago

Yeah, it will 'eat' all ampersands around the removed part. Can probably put one back:

$redirect['query'] = preg_replace( '#(^|&)(p|page_id|cat|tag)=?(&|$)#'
, '$3', redirect['query'] );

#12 follow-up: @filosofo
15 years ago

Make it a little longer, and you can trim away beginning or ending "&"s that would otherwise be left over by something like 'tag=php&p=&' or 'p=&tag=php':

$redirect['query'] = preg_replace( '#^&|(&(?!(p|page_id|cat|tag)=?&[^$]))?\b(p|page_id|cat|tag)=?(&|$)|&$#', '', $redirect['query'] );

#13 @stevish
15 years ago

This was kinda obvious, but I tested to make sure: Tags ending in tag, cat, and page_id (I always end my tags in page_id, don't you?) are also affected.

#14 in reply to: ↑ 12 @azaozz
15 years ago

Replying to filosofo:

Make it a little longer, and you can trim away beginning or ending "&"s that would otherwise be left over by something like 'tag=php&p=&' or 'p=&tag=php':

$redirect['query'] = 
preg_replace( '#^&|(&(?!(p|page_id|cat|tag)=?&[^$]))?\b(p|page_id|cat|tag)=?(&|$)|&$#',
 '', $redirect['query'] );

That's better, but will replace things like: tag=page_id or tag=tag. It becomes tag=.

Ideally it shouldn't touch anything on the right side of = and before & or $.

#15 in reply to: ↑ 11 @stevish
15 years ago

Here's one that will grab an empty variable at the beginning, end, or middle. I did a lot of testing, and couldn't fool it. Maybe someone else can see a problem with it?

'#(^|&)(p|page_id|cat|tag)=?$|(p|page_id|cat|tag)=&#'

Notice that when looking for an empty declaration in the beginning or middle (without requiring an "end of string"), it requires a '=&' at the end, without checking for one at the beginning. This will make sure we only remove one '&' per declaration, and conceivably, will never touch any set of letters directly following an '='.

Any objections?

@stevish
15 years ago

Removes any empty p=, tag=, cat=, or page_id=, without removing any combination of letters directly to the right of an '='

#16 follow-up: @stevish
15 years ago

  • Keywords has-patch 2nd-opinion added

with 8098.3:

p=&page_id=10&cat=&tag=php => page_id=10&tag=php
tag=p => tag=p
tag= =>
tag=php&p= => tag=php
tag=php&p => tag=php
p=&tag=php => tag=php

Am I missing anything? I'm assuming that this is the desired result, but I'm not 100% sure... there could be other implications I'm not aware of.

@fitztrev
15 years ago

Moving the = sign to be part of each query var name seems to do the trick, too.

#17 in reply to: ↑ 16 @filosofo
15 years ago

Replying to stevish:

Am I missing anything?

In theory it's possible the query string could be like '&tag=php&p=' or 'tag=php&p=&', in which case there are leftover "&"s.

How about this? It seems to handle all of the as-yet-mentioned examples:

$redirect['query'] = preg_replace( '#^&|(&(?!(?P<q>\b(p|page_id|cat|tag)\b)=?&.))?(?<!=)(?P>q)=?(&|$)|&$#', '', $redirect['query'] );

#18 @ryan
15 years ago

  • Owner changed from anonymous to markjaquith

#19 @ryan
15 years ago

  • Component changed from General to Permalinks

#21 @markjaquith
15 years ago

  • Resolution set to fixed
  • Status changed from new to closed

(In [9642]) Be more picky about the trailing blank query string things we strip. props filosofo, fitztrev, stevish, azaozz, DD32 (go team!). fixes #8098. fixes #8180

#22 @markjaquith
15 years ago

(In [9645]) Catch more blank query string thing cases, without resorting to crazy regex. props filosofo. fixes #8098

Note: See TracTickets for help on using tickets.