Make WordPress Core

Opened 3 years ago

Closed 3 years ago

#54471 closed defect (bug) (duplicate)

WordPress accepts non-alpahbetical characters in the URL

Reported by: lubosr's profile lubosr Owned by:
Milestone: Priority: normal
Severity: critical Version:
Component: Canonical Keywords:
Focuses: Cc:

Description

Hello,

Originally I have created a forum post to address this issue as I thought that this problem is affecting only my website. Here is the original post: https://wordpress.org/support/topic/issue-with-urls-and-extra-characters/#new-topic-0

The wordpress accepts any of the following ( but not limited to) url's for the same post and does not throw 404 error. Examples:

https://exmaple.com/my-awesome-article- ( note the trailing - )
https://exmaple.com/-my-awesome-article ( note the leading - )
https://exmaple.com/my-awesome.article ( note the . instead of - )
https://exmaple.com/my-awesome-article, ( note the trailing , )

This issue has a potential for double content as the URL is distinct for all pages.

First I have spend some time to rectify this issue by changing and saving permalink settings, changing theme, checking to apache redirection and site config file etc but to no avail.

However, then I have discovered that many of the wordpress websites are plagued with the same issue. Here are some examples:
Correct URL:
https://techcrunch.com/2021/11/18/webcams-and-microphones-for-better-video-calls/
Broken URL:
https://techcrunch.com/2021/11/18/webcams-and-microphones.------for-better-video-calls/
The issue is not limited to only "." in the url. See the "=" and ",". Here is another example from microsoft website:
Correct URL:
https://news.microsoft.com/transform/novartis-empowers-scientists-ai-speed-discovery-development-breakthrough-medicines/
Broken URL:
https://news.microsoft.com/transform/novartis-empowers-scientists===.=ai-speed-discovery.-------development-breakthrough-medicines,,,/

Here is the wget output for the above URL as you can see no 404 has been shown:

wget https://news.microsoft.com/transform/novartis-empowers-scientists===.=ai-speed-discovery.-------development-breakthrough-medicines,,,/
--2021-11-19 12:39:02--  https://news.microsoft.com/transform/novartis-empowers-scientists===.=ai-speed-discovery.-------development-breakthrough-medicines,,,/
Resolving news.microsoft.com (news.microsoft.com)... 141.193.213.21, 141.193.213.20
Connecting to news.microsoft.com (news.microsoft.com)|141.193.213.21|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘index.html

The list of the websites goes on and on as I have tested multiple portals.

Change History (1)

#1 @SergeyBiryukov
3 years ago

  • Component changed from General to Canonical
  • Milestone Awaiting Review deleted
  • Resolution set to duplicate
  • Status changed from new to closed

Hi there, welcome to WordPress Trac!

Thanks for the report, we're already tracking this issue in #14773. Also related: #17653, #35437.

This issue has a potential for double content as the URL is distinct for all pages.

Just noting that this should not cause any SEO issues, as long as the site has a rel="canonical" link pointing to the correct URL. WordPress core outputs these links by default on singular posts or pages. The examples above both have these links:

<link rel="canonical" href="https://techcrunch.com/2021/11/18/webcams-and-microphones-for-better-video-calls/" />
<link rel="canonical" href="https://news.microsoft.com/transform/novartis-empowers-scientists-ai-speed-discovery-development-breakthrough-medicines/" />
Note: See TracTickets for help on using tickets.