Make WordPress Core

Opened 4 years ago

Last modified 4 years ago

#51348 new defect (bug)

Trailing slash redirect removes pipes from query strings

Reported by: simjost's profile simjost Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 5.5.1
Component: Canonical Keywords:
Focuses: Cc:

Description

We are using some query strings for tracking and campaign purposes and some of them contain pipes.
An example:
https://example.com/page?key=value1|value2

When using one of the template permalinks a trailing slash is added at the end and WordPress will redirect from non trailing slash url to a trailing slash one. Same goes the other way around when a custom permalink is set without a trailing slash and an url with a trailing slash is accessed.

We noticed, that during this redirect all pipes in the query string are lost/removed/stripped.
The example above will result in:
https://example.com/page/?key=value1value2

Change History (9)

#1 @TimothyBlynJacobs
4 years ago

  • Keywords reporter-feedback added

Hi @simjost welcome to trac!

Have you tried url encoding your query parameter? As far as I know, a pipe is not an allowed character, so it must be url encoded.

When I try http://trunk.test/sample-page?key=v1%7Cv2 I get successfully redirected to http://trunk.test/sample-page/?key=v1%7Cv2.

#2 @SergeyBiryukov
4 years ago

  • Component changed from Permalinks to Canonical
  • Keywords close added

#3 follow-up: @simjost
4 years ago

Hi @TimothyBlynJacobs,

yes, we tried encoding it and it is fine. Unfortunately the plain pipe is needed. I have informed the people making the requirements on this, that they should reconsider.

However, to my knowledge the pipe is not a unallowed character in urls anymore and WordPress is handeling it very differently in different circumstances.
On the root page with or without the trailing slash, WordPress is encoding it by default.
On subpages without a trailing slash, the pipe gehts removed (and our WordPress redirects to the url with a trailing slash).
On subpages with a trailing slash, the pipe stays.

The behaviour should be uniform across all scenarios.

#4 in reply to: ↑ 3 @TimothyBlynJacobs
4 years ago

Replying to simjost:

However, to my knowledge the pipe is not a unallowed character in urls anymore and WordPress is handeling it very differently in different circumstances.

What are you basing that off of?

On the root page with or without the trailing slash, WordPress is encoding it by default.
On subpages without a trailing slash, the pipe gehts removed (and our WordPress redirects to the url with a trailing slash).
On subpages with a trailing slash, the pipe stays.

That's because a canonical redirect isn't being triggered for those URLs.

#5 @simjost
4 years ago

Hi @TimothyBlynJacobs,

I have no definitive proof, that pipes are allowed. When checking and searching, I read that they were forbidden some time ago and it's still not suggested to use them. But I didn't check any further.
If possible I avoid them of course :)

Alright, so the issue happens during a canonical redirect. I still think the pipe should not just disappear. Either it stays or it gets encoded, like on the base url.

#6 @TimothyBlynJacobs
4 years ago

I'm not aware of any changes to the specifications that would mean they shouldn't be percent encoded.

Trying to automatically percent encode characters is a possibility, but feels like it could get messy very fast.

I wonder if @peterwilsoncc has any thoughts on this?

#7 @hellofromTonya
4 years ago

  • Keywords 2nd-opinion added; reporter-feedback removed

Removing reporter-feedback as feedback has been given. As Timothy notes, currently seeking 2nd-opinion from @peterwilsoncc.

#8 @peterwilsoncc
4 years ago

  • Keywords close 2nd-opinion removed

I've been able to reproduce and it looks like the query string is messed up in wp_sanitize_redirect() which includes some quite complex regex, the line causing the problem is:

$location = preg_replace( '|[^a-z0-9-~+_.?#=&;,/:%!*\[\]()@]|i', '', $location );

As it appears browsers support the character unencoded, my inclination is to encode it to %7C on redirects as the RFC considers it unwise.

My reasoning for this is that wp_sanitize_redirect() already does auto-encoding for other characters that ought to be encoded but browsers support in an unencoded state. @TimothyBlynJacobs is correct, it's pretty messy ;)

#9 @simjost
4 years ago

I am not familiar with the WordPress workflow to resolve these kind of bugs and don't want to bump spam. Still I wanted to check in if there has been some progress on this in the background, where I might have missed the updates.

Note: See TracTickets for help on using tickets.