Opened 3 years ago
Closed 3 years ago
#54471 closed defect (bug) (duplicate)
WordPress accepts non-alpahbetical characters in the URL
Reported by: | lubosr | Owned by: | |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | critical | Version: | |
Component: | Canonical | Keywords: | |
Focuses: | Cc: |
Description
Hello,
Originally I have created a forum post to address this issue as I thought that this problem is affecting only my website. Here is the original post: https://wordpress.org/support/topic/issue-with-urls-and-extra-characters/#new-topic-0
The wordpress accepts any of the following ( but not limited to) url's for the same post and does not throw 404 error. Examples:
https://exmaple.com/my-awesome-article- ( note the trailing - )
https://exmaple.com/-my-awesome-article ( note the leading - )
https://exmaple.com/my-awesome.article ( note the . instead of - )
https://exmaple.com/my-awesome-article, ( note the trailing , )
This issue has a potential for double content as the URL is distinct for all pages.
First I have spend some time to rectify this issue by changing and saving permalink settings, changing theme, checking to apache redirection and site config file etc but to no avail.
However, then I have discovered that many of the wordpress websites are plagued with the same issue. Here are some examples:
Correct URL:
https://techcrunch.com/2021/11/18/webcams-and-microphones-for-better-video-calls/
Broken URL:
https://techcrunch.com/2021/11/18/webcams-and-microphones.------for-better-video-calls/
The issue is not limited to only "." in the url. See the "=" and ",". Here is another example from microsoft website:
Correct URL:
https://news.microsoft.com/transform/novartis-empowers-scientists-ai-speed-discovery-development-breakthrough-medicines/
Broken URL:
https://news.microsoft.com/transform/novartis-empowers-scientists===.=ai-speed-discovery.-------development-breakthrough-medicines,,,/
Here is the wget output for the above URL as you can see no 404 has been shown:
wget https://news.microsoft.com/transform/novartis-empowers-scientists===.=ai-speed-discovery.-------development-breakthrough-medicines,,,/ --2021-11-19 12:39:02-- https://news.microsoft.com/transform/novartis-empowers-scientists===.=ai-speed-discovery.-------development-breakthrough-medicines,,,/ Resolving news.microsoft.com (news.microsoft.com)... 141.193.213.21, 141.193.213.20 Connecting to news.microsoft.com (news.microsoft.com)|141.193.213.21|:443... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: ‘index.html
The list of the websites goes on and on as I have tested multiple portals.
Hi there, welcome to WordPress Trac!
Thanks for the report, we're already tracking this issue in #14773. Also related: #17653, #35437.
Just noting that this should not cause any SEO issues, as long as the site has a
rel="canonical"
link pointing to the correct URL. WordPress core outputs these links by default on singular posts or pages. The examples above both have these links: