WordPress.org

Make WordPress Core

Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#8098 closed defect (bug) (fixed)

trailing p's removed from query by canonical redirect

Reported by: DD32 Owned by: markjaquith
Milestone: 2.7 Priority: normal
Severity: normal Version: 2.7
Component: Permalinks Keywords: has-patch 2nd-opinion
Focuses: Cc:

Description

It appears to just be an unescaped ?, see attached patch

Affected URL's are of the format:

  • ?tag=php redirects to ?tag=ph

Attachments (4)

8098.diff (626 bytes) - added by DD32 5 years ago.
8098.2.diff (628 bytes) - added by DD32 5 years ago.
8098.3.diff (646 bytes) - added by stevish 5 years ago.
Removes any empty p=, tag=, cat=, or page_id=, without removing any combination of letters directly to the right of an '='
8098.4.diff (618 bytes) - added by fitztrev 5 years ago.
Moving the = sign to be part of each query var name seems to do the trick, too.

Download all attachments as: .zip

Change History (26)

comment:1 DD325 years ago

  • Keywords has-patch removed

The problem is caused by
http://trac.wordpress.org/browser/trunk/wp-includes/canonical.php#L207

$redirect['query'] = preg_replace( '#&?(p|page_id|cat|tag)=?$#', '', $redirect['query'] );

I was thinking the first ? was unescaped, but on closer inspection, It appears my logic was wrong, The start of the expression needs to be tied to either the start of the string, or a &, And i'm not quite sure how to achieve that

comment:2 stevish5 years ago

It should be '#&?(p|page_id|cat|tag)=?$#'

DD325 years ago

DD325 years ago

comment:3 stevish5 years ago

bummer, I should have used preview... It should be

'#&?(p|page_id|cat|tag)=?$#'

comment:4 stevish5 years ago

ARRRG

'#^&?(p|page_id|cat|tag)=?$#'

comment:5 DD325 years ago

attachment 8098.2.diff added.

  • Do not replace it if it is followed by a word character
    • 'p=' =>
    • 'tag=php&p=' => 'tag=php'
    • 'tag=php' => 'tag=php'
    • 'p=&tag=php' => 'p=&tag=php' (This case didnt work before either)

'#^&?(p|page_id|cat|tag)=?$#'
That wouldnt work either, that requires that the param be the first on the line, whereas i believe its supposed to cleanup trailing items too

comment:6 stevish5 years ago

I've also tested the following... it's simple and to the point:

'#(^p|^page_id|^cat|^tag|&p|&page_id|&cat|&tag)=?$#' ? Would that work?

I've tried it on a regex tester, and it seems to work...

Yeah, I caught that issue on my last idea too

comment:7 follow-up: DD325 years ago

Another option instead of using a Negitive Assertion would be to use a word boundary:

$redirect['query'] = preg_replace( '#\b&?(p|page_id|cat|tag)=?$#', '', $redirect['query'] );

which achieves the same result (Needs to start with a non-word character), I'm not sure which'd be faster, I assume the word boundary would be.

'#(^p|^page_id|^cat|^tag|&p|&page_id|&cat|&tag)=?$#'

Hm, That could work, looks a bit ugly though (but what regex doesnt?)

comment:8 in reply to: ↑ 7 azaozz5 years ago

Replying to DD32:

... I'm not sure which'd be faster, I assume the word boundary would be.

This is usually a very short string, so speed is not that important.

'#(^p|^page_id|^cat|^tag|&p|&page_id|&cat|&tag)=?$#'

Hm, That could work, looks a bit ugly though (but what regex doesnt?)

I think this can also be written as:

'#(^|&)(p|page_id|cat|tag)=?$#'

and if added at the end, it will also clean example 3 above, so 'p=&tag=php' => 'tag=php'

'#(^|&)(p|page_id|cat|tag)=?(&|$)#'

Of course this will have to be well tested first to make sure it doesn't break anything else.

comment:9 follow-up: DD325 years ago

'#(^|&)(p|page_id|cat|tag)=?$#'

In my quick testing, Using in ()'s didnt appear to work, However, It does.

The only problem with that one, is it causes this:

In: flame=abc&p=&test=abs
Out:flame=abctest=abs

comment:10 in reply to: ↑ 9 DD325 years ago

Replying to DD32:

Ok, That really didnt turn out very well :)
In my testing i found in ()'s didnt seem to work correctly.. But it does appear to..

Regex: '#(^|&)(p|page_id|cat|tag)=?(&|$)#'
In: flame=abc&p=&test=abs
Out:flame=abctest=abs

comment:11 follow-up: azaozz5 years ago

Yeah, it will 'eat' all ampersands around the removed part. Can probably put one back:

$redirect['query'] = preg_replace( '#(^|&)(p|page_id|cat|tag)=?(&|$)#'
, '$3', redirect['query'] );

comment:12 follow-up: filosofo5 years ago

Make it a little longer, and you can trim away beginning or ending "&"s that would otherwise be left over by something like 'tag=php&p=&' or 'p=&tag=php':

$redirect['query'] = preg_replace( '#^&|(&(?!(p|page_id|cat|tag)=?&[^$]))?\b(p|page_id|cat|tag)=?(&|$)|&$#', '', $redirect['query'] );

comment:13 stevish5 years ago

This was kinda obvious, but I tested to make sure: Tags ending in tag, cat, and page_id (I always end my tags in page_id, don't you?) are also affected.

comment:14 in reply to: ↑ 12 azaozz5 years ago

Replying to filosofo:

Make it a little longer, and you can trim away beginning or ending "&"s that would otherwise be left over by something like 'tag=php&p=&' or 'p=&tag=php':

$redirect['query'] = 
preg_replace( '#^&|(&(?!(p|page_id|cat|tag)=?&[^$]))?\b(p|page_id|cat|tag)=?(&|$)|&$#',
 '', $redirect['query'] );

That's better, but will replace things like: tag=page_id or tag=tag. It becomes tag=.

Ideally it shouldn't touch anything on the right side of = and before & or $.

comment:15 in reply to: ↑ 11 stevish5 years ago

Here's one that will grab an empty variable at the beginning, end, or middle. I did a lot of testing, and couldn't fool it. Maybe someone else can see a problem with it?

'#(^|&)(p|page_id|cat|tag)=?$|(p|page_id|cat|tag)=&#'

Notice that when looking for an empty declaration in the beginning or middle (without requiring an "end of string"), it requires a '=&' at the end, without checking for one at the beginning. This will make sure we only remove one '&' per declaration, and conceivably, will never touch any set of letters directly following an '='.

Any objections?

stevish5 years ago

Removes any empty p=, tag=, cat=, or page_id=, without removing any combination of letters directly to the right of an '='

comment:16 follow-up: stevish5 years ago

  • Keywords has-patch 2nd-opinion added

with 8098.3:

p=&page_id=10&cat=&tag=php => page_id=10&tag=php
tag=p => tag=p
tag= =>
tag=php&p= => tag=php
tag=php&p => tag=php
p=&tag=php => tag=php

Am I missing anything? I'm assuming that this is the desired result, but I'm not 100% sure... there could be other implications I'm not aware of.

fitztrev5 years ago

Moving the = sign to be part of each query var name seems to do the trick, too.

comment:17 in reply to: ↑ 16 filosofo5 years ago

Replying to stevish:

Am I missing anything?

In theory it's possible the query string could be like '&tag=php&p=' or 'tag=php&p=&', in which case there are leftover "&"s.

How about this? It seems to handle all of the as-yet-mentioned examples:

$redirect['query'] = preg_replace( '#^&|(&(?!(?P<q>\b(p|page_id|cat|tag)\b)=?&.))?(?<!=)(?P>q)=?(&|$)|&$#', '', $redirect['query'] );

comment:18 ryan5 years ago

  • Owner changed from anonymous to markjaquith

comment:19 ryan5 years ago

  • Component changed from General to Permalinks

comment:21 markjaquith5 years ago

  • Resolution set to fixed
  • Status changed from new to closed

(In [9642]) Be more picky about the trailing blank query string things we strip. props filosofo, fitztrev, stevish, azaozz, DD32 (go team!). fixes #8098. fixes #8180

comment:22 markjaquith5 years ago

(In [9645]) Catch more blank query string thing cases, without resorting to crazy regex. props filosofo. fixes #8098

Note: See TracTickets for help on using tickets.