Opened 5 years ago
Last modified 5 months ago
#52865 new enhancement
Strip 'enclosed' trailing spaces in URLs
| Reported by: |
|
Owned by: | |
|---|---|---|---|
| Milestone: | Awaiting Review | Priority: | low |
| Severity: | normal | Version: | |
| Component: | Canonical | Keywords: | |
| Focuses: | performance | Cc: |
Description (last modified by )
#20383 made improvements that strip trailing punctuation from URLs. E.g., https://ma.tt/2012/03/productivity-per-square-inch%20 redirects correctly to the canonical URL.
However, URLs like https://ma.tt/2012/03/productivity-per-square-inch%20/ (which 'enclose' the trailing space with a trailing slash) are not redirected. It, and others like it, typically return a 404 error.
This kind of 'broken link' pattern is extremely common on the web; particular as a trailing slash is often appended to a malformed URL before WP runs (e.g., via a server/htaccess/nginx configuration).
We should refine the canonical redirect logic (in redirect_canonical) to also consider and redirect these types of requests.
Considerations
- The "Remove trailing spaces and end punctuation from the path" section of
redirect_canonicaldoesn't consider the presence of trailing slashes in the URL. This could/should be adapted to catch those.
- There might be cases where a user 'legitimately' has a permalink structure (or slug) that ends in
%20or%20/. That might(?) make a fix more complicated than just sniffing for whether the permalink structure ends with/.
- It looks like it's inconsistent in WP where
%20(and/or%20/) can be added to slugs or structures. It's stripped in some places, but not in others.
- Should a permalink or slug be 'allowed' to contain, or end in, a space character? If this is being stripped in some parts of WP, maybe that's a good argument to prevent it elsewhere/everywhere. In which case, fixing this becomes a lot simpler.
Change History (4)
#3
in reply to:
↑ 2
@
5 months ago
Replying to flixos90:
@jonoaldersonwp On the one hand, changing
redirect_canonical()to cater for URLs with trailing slashes seems very reasonable to me. On the other hand, wouldn't that then break existing sites with such URLs? I'm wondering if there would be backward compatibility issues.
It's my assumption that redirect_canonical fires after we know that the request is a 404 / doesn't match a known resouce, so we should be safe? Maybe that's not the case!
@jonoaldersonwp On the one hand, changing
redirect_canonical()to cater for URLs with trailing slashes seems very reasonable to me. On the other hand, wouldn't that then break existing sites with such URLs? I'm wondering if there would be backward compatibility issues.