#8098 closed defect (bug) (fixed)
trailing p's removed from query by canonical redirect
Reported by: | DD32 | Owned by: | markjaquith |
---|---|---|---|
Milestone: | 2.7 | Priority: | normal |
Severity: | normal | Version: | 2.7 |
Component: | Permalinks | Keywords: | has-patch 2nd-opinion |
Focuses: | Cc: |
Description
It appears to just be an unescaped ?, see attached patch
Affected URL's are of the format:
- ?tag=php redirects to ?tag=ph
Attachments (4)
Change History (26)
#5
@
15 years ago
attachment 8098.2.diff added.
- Do not replace it if it is followed by a word character
- 'p=' =>
- 'tag=php&p=' => 'tag=php'
- 'tag=php' => 'tag=php'
- 'p=&tag=php' => 'p=&tag=php' (This case didnt work before either)
'#^&?(p|page_id|cat|tag)=?$#'
That wouldnt work either, that requires that the param be the first on the line, whereas i believe its supposed to cleanup trailing items too
#6
@
15 years ago
I've also tested the following... it's simple and to the point:
'#(^p|^page_id|^cat|^tag|&p|&page_id|&cat|&tag)=?$#'
? Would that work?
I've tried it on a regex tester, and it seems to work...
Yeah, I caught that issue on my last idea too
#7
follow-up:
↓ 8
@
15 years ago
Another option instead of using a Negitive Assertion would be to use a word boundary:
$redirect['query'] = preg_replace( '#\b&?(p|page_id|cat|tag)=?$#', '', $redirect['query'] );
which achieves the same result (Needs to start with a non-word character), I'm not sure which'd be faster, I assume the word boundary would be.
'#(^p|^page_id|^cat|^tag|&p|&page_id|&cat|&tag)=?$#'
Hm, That could work, looks a bit ugly though (but what regex doesnt?)
#8
in reply to:
↑ 7
@
15 years ago
Replying to DD32:
... I'm not sure which'd be faster, I assume the word boundary would be.
This is usually a very short string, so speed is not that important.
'#(^p|^page_id|^cat|^tag|&p|&page_id|&cat|&tag)=?$#'
Hm, That could work, looks a bit ugly though (but what regex doesnt?)
I think this can also be written as:
'#(^|&)(p|page_id|cat|tag)=?$#'
and if added at the end, it will also clean example 3 above, so 'p=&tag=php' => 'tag=php'
'#(^|&)(p|page_id|cat|tag)=?(&|$)#'
Of course this will have to be well tested first to make sure it doesn't break anything else.
#9
follow-up:
↓ 10
@
15 years ago
'#(^|&)(p|page_id|cat|tag)=?$#'
In my quick testing, Using in ()'s didnt appear to work, However, It does.
The only problem with that one, is it causes this:
In: flame=abc&p=&test=abs Out:flame=abctest=abs
#10
in reply to:
↑ 9
@
15 years ago
Replying to DD32:
Ok, That really didnt turn out very well :)
In my testing i found in ()'s didnt seem to work correctly.. But it does appear to..
Regex: '#(^|&)(p|page_id|cat|tag)=?(&|$)#' In: flame=abc&p=&test=abs Out:flame=abctest=abs
#11
follow-up:
↓ 15
@
15 years ago
Yeah, it will 'eat' all ampersands around the removed part. Can probably put one back:
$redirect['query'] = preg_replace( '#(^|&)(p|page_id|cat|tag)=?(&|$)#' , '$3', redirect['query'] );
#12
follow-up:
↓ 14
@
15 years ago
Make it a little longer, and you can trim away beginning or ending "&"s that would otherwise be left over by something like 'tag=php&p=&'
or 'p=&tag=php'
:
$redirect['query'] = preg_replace( '#^&|(&(?!(p|page_id|cat|tag)=?&[^$]))?\b(p|page_id|cat|tag)=?(&|$)|&$#', '', $redirect['query'] );
#13
@
15 years ago
This was kinda obvious, but I tested to make sure: Tags ending in tag, cat, and page_id (I always end my tags in page_id, don't you?) are also affected.
#14
in reply to:
↑ 12
@
15 years ago
Replying to filosofo:
Make it a little longer, and you can trim away beginning or ending "&"s that would otherwise be left over by something like
'tag=php&p=&'
or'p=&tag=php'
:
$redirect['query'] = preg_replace( '#^&|(&(?!(p|page_id|cat|tag)=?&[^$]))?\b(p|page_id|cat|tag)=?(&|$)|&$#', '', $redirect['query'] );
That's better, but will replace things like: tag=page_id
or tag=tag
. It becomes tag=
.
Ideally it shouldn't touch anything on the right side of =
and before &
or $
.
#15
in reply to:
↑ 11
@
15 years ago
Here's one that will grab an empty variable at the beginning, end, or middle. I did a lot of testing, and couldn't fool it. Maybe someone else can see a problem with it?
'#(^|&)(p|page_id|cat|tag)=?$|(p|page_id|cat|tag)=&#'
Notice that when looking for an empty declaration in the beginning or middle (without requiring an "end of string"), it requires a '=&' at the end, without checking for one at the beginning. This will make sure we only remove one '&' per declaration, and conceivably, will never touch any set of letters directly following an '='.
Any objections?
@
15 years ago
Removes any empty p=, tag=, cat=, or page_id=, without removing any combination of letters directly to the right of an '='
#16
follow-up:
↓ 17
@
15 years ago
- Keywords has-patch 2nd-opinion added
with 8098.3:
p=&page_id=10&cat=&tag=php => page_id=10&tag=php
tag=p => tag=p
tag= =>
tag=php&p= => tag=php
tag=php&p => tag=php
p=&tag=php => tag=php
Am I missing anything? I'm assuming that this is the desired result, but I'm not 100% sure... there could be other implications I'm not aware of.
#17
in reply to:
↑ 16
@
15 years ago
Replying to stevish:
Am I missing anything?
In theory it's possible the query string could be like '&tag=php&p='
or 'tag=php&p=&'
, in which case there are leftover "&"s.
How about this? It seems to handle all of the as-yet-mentioned examples:
$redirect['query'] = preg_replace( '#^&|(&(?!(?P<q>\b(p|page_id|cat|tag)\b)=?&.))?(?<!=)(?P>q)=?(&|$)|&$#', '', $redirect['query'] );
The problem is caused by
http://trac.wordpress.org/browser/trunk/wp-includes/canonical.php#L207
I was thinking the first ? was unescaped, but on closer inspection, It appears my logic was wrong, The start of the expression needs to be tied to either the start of the string, or a &, And i'm not quite sure how to achieve that