Make WordPress Core

Opened 8 years ago

Last modified 4 years ago

#35248 reopened enhancement

WordPress should remove domain trailing dot (as/like it removes "www.")

Reported by: qdinar's profile qdinar Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 4.4
Component: Canonical Keywords:
Focuses: Cc:

Description (last modified by SergeyBiryukov)

I think, WordPress should redirect from address with trailing dot of domain to version without the dot, like it redirects from address with "www.".

I have read about which version is correct and seems version without dot is allowed according to RFCs, and it is very widely used, and even almost nobody know that trailing dot can be used.

(
https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol "The first definition of HTTP/1.1, the version of HTTP in common use, occurred in RFC 2068 in 1997, although this was obsoleted by RFC 2616 in 1999"
->
https://tools.ietf.org/html/rfc2616#section-14.23 "The Host field value MUST represent the naming authority of the origin server or gateway given by the original URL."
->
https://tools.ietf.org/html/rfc2616#section-3.2.1 "For definitive information on URL syntax and semantics, see "Uniform Resource Identifiers (URI): Generic Syntax and Semantics," RFC 2396 ..."
->
http://tools.ietf.org/html/rfc2396#section-3.2.2 "The rightmost domain label of a fully qualified domain name will never start with a digit, thus syntactically distinguishing domain names from IPv4 addresses, and may be followed by a single "." if it is necessary to distinguish between the complete domain name and any local domain."
)

Change History (12)

#1 @SergeyBiryukov
8 years ago

  • Component changed from General to Canonical
  • Description modified (diff)
  • Summary changed from Wordpress should remove domain trailing dot (as/like it removes "www.") to WordPress should remove domain trailing dot (as/like it removes "www.")

#2 @qdinar
8 years ago

i edit my words: "and few people know that trailing dot can be used".

Last edited 8 years ago by qdinar (previous) (diff)

#4 @qdinar
8 years ago

  • Status changed from new to closed

i have changed my mind. i thought: probably this feature of relative domains are not used, for example, probably nobody uses domain team1011 , they just use team1011.ourcompany.com. but maybe that was or is used for local internet cache when internet is slow. i think that that can be used in future in internet if people will live on other planets: assume there will be domain "mars." and they will have marsianlocalsite.mars. and earth sites will have caches at earth-cache.mars. : somesite.net.earth-cache.mars. but they in mars can access it with "somesite.net" in their address bar. some sites can be hosted also on mars: "facebook.com.mars." (example) and they can also work with "facebook.com" in marsian browser's address bar. if wordpress will have feature to be able to be installed in several hostings, then it will also can be such way. then my-wp.name.mars./sync-cron-job.php can call/request my-wp.name./sync.php to sync comments, posts, etc, and then redirecting should not occur because it will go long time to mars just for redirect, and such redirect would not be correct, because would lead the sync script to its own wordpress installation.
i knew about this possibility of use in future but i thought it is not needed now. but , maybe, somebody can use this feature of dns in countries or companies with slow outbound internet to make local cache of internet, or somebody can test it to use with space internet.
as far as i understand it works this way, (example): somedomain.com is searched by martian dns server firstly in mars. , then, if not found, in earth-cache.mars. , and, if it is somehow denied there, and, for example, also denied in "another-earth-cache.mars.", then in ".".
also i thought that as far as i know browsers even do not query dns for domains like team1011, but query default search engine instead. but several years ago, as i remember, firefox tried to add ".com" to such addresses. whether there are dns "search lists" any how used, i think probably not, because also people can easily change their dns servers, but they can be used in some company buildings or countries, and also special addons for browsers may be used to fix that.

Last edited 8 years ago by qdinar (previous) (diff)

#5 @qdinar
8 years ago

  • Status changed from closed to reopened

i see i closed this by mistake (inaccurate keypress), because i was/am hardly using my laptop without mouse nor touchpad. but that closing is also maybe correct. (so i reopen this).

Last edited 8 years ago by qdinar (previous) (diff)

#6 @qdinar
8 years ago

but i think this thing is not designed very well and maybe it is not too late to change that, because, as i known out from wikipedia several days ago, rfcs are not always very strict laws, there are RFCs with different status, though i do not know of what type are ones about trailing dot, and trailing dot is not widely used but used in "bind" dns server configs etc. i think
1) trailing dot should not differ by its behaviour from other dots. first dot cannot create additional subdomain, so last dot also should not behave like it is itself something like a domain.
2) also, ambiguity of domain without last dot is bad thing.
3) and, trailing dot, especially just before slash, is not beautiful.
-- i think, there are alternative ways instead of the trailing dot:
1) example.com which is really example.com.mars. or example.com.something.mars. could be written "example.com..." and real example.com could be written "example.com".
2) another alternative way: always using full domain like "example.com.something.mars" but only browser can show it in address bar as "example.com..." with something like tear or folding door icon or 3 dots at end, and clicking that shows full domain.

Last edited 8 years ago by qdinar (previous) (diff)

#7 @qdinar
8 years ago

"3) and, trailing dot, especially just before slash, is not beautiful" - i think , this seems to me so because the extra dot looks like having redundant information, because top level domain already means the top, but, if relative domains with "fake" top level domains will become much used somewhere in future, it will not be so redundant.

#9 @qdinar
7 years ago

my reply to the text by the first link ( https://web.archive.org/web/20160604095348/http://homepage.ntlworld.com/jonathan.deboynepollard/FGA/web-fully-qualified-domain-name.html ):

Originally, as defined in RFC 1738 (§ 3.1), the "host" portion of a (Common Internet Scheme) URL was always and unequivocally a fully qualified domain name and the conventional mechanism for distinguishing fully-qualified domain names from non-fully-qualified domain names did not apply. Whether it was example.com. or example.com, the host was intended to be the same.

-- i think he is not right, i think "example.com" was not allowed at all in urls according to rfc 1738, it is cited in the second text, and i cite it:

3.1. Common Internet Scheme Syntax
        //<user>:<password>@<host>:<port>/<url-path>
    host
        The fully qualified domain name of a network host

and "example.com" could not be used in http headers at that time, because rfc 1738 is of 1994 and host field appeared only with http 1.1 in 1997 (you can check in wikipedia).

so, indeed, only fqdn was left allowed in urls. i think, this was a error in rfc 1738, because in such way it made (tried to make) the "relative domains" feature useless. if it did not disallow it, they theoretically could be used in "a" tag hrefs in local scripted sites or static html documentation inside big companies that used relative domains, if browsers and servers supported it. but even if rfc 1738 disallowed them, people did not obey it: they continued to use top level domains in relative form ie without trailing dot, so this disallowing by rfc 1738 was not a big practical problem anyway, and people had and used an alternative to relative domains: they just made local top-level domains like "localhost" (and used and use them also without trailing dot).

then he says:

Unfortunately, in practice web browsers have always violated that specification and passed the "host" portion through the name qualification procedures of their DNS Client libraries when mapping the host name to a set of IP addresses. (For example, those that used the BIND DNS Client library would leave the RES_DNSRCH option set and would not append the final trailing dot if it was missing.)

-- i think he meaned that hosts without trailing dot should be just thrown off as error, and only absolute domains (fqdn) should be passed to dns. i think probably browsers did pass all domains to dns because people used their custom local top level domains like "localhost". and anyway, later in rfc 2396 published in 1998, the usage of top level domains in urls without trailing dots was allowed.

then the author (Jonathan de Boyne Pollard) cites rfc 2396 and regrets about it changed according to the established human behaviour ie de facto standarts, says that better would be if browsers obeyed rfc 1738, and recommends to all people to use only fqdn, in all places, as it was commanded by rfc 1738.

-- but what would happen if people obeyed rfc 1738? urls like "http://example.com/test.html" and "http://localhost/test.html" all had to be rewritten as "http://example.com./test.html" and "http://localhost./test.html". browser would have to either mark hosts without dots as error, or redirect on clicking them to full/absolute form of them. all people who configured local top-level domains like "localhost" would have to configure their servers to accept only requests for domains like "localhost." , or accept and redirect [all urls inside] "localhost" to [corresponding urls in] "localhost.". text like "localhost" would stay useful only when typing it in browser address bar, but that would be only very useless usage, and the relative domain feature is not needed for that, because browsers search for domains on typing. usage of them in html source would become useless because it would lead to that such links would not work, or clicking all links with "localhost" would move user to "localhost." and it would be just extra redirect on every click (on such links). so, rfc 1738 would make the planned "relative domain" feature entirely useless. if some company used that feature, and used their relative domains in their local sites, and their urls with relative domains were not redirected to absolute form by browsers, so their sites worked normally, if they also obeyed rfc 1736, they would configure their servers to accept only fqdn, and they would have to either rewrite all their such urls with fqdn, or work with extra redirect on every click on such urls. if that companies liked having short domain like "team101" instead of "team101.microsoft.com." in their address bars and html sources, they would have to start to use their custom internal top-level domains like "team101." ie like "localhost." instead of subdomains like "team101.microsoft.com." (which could be used as just "team101" before they decided to obey rfc 1738).

---

and i have found out that the trailing dot, which was so strongly supported by rfc 1738, really appeared only after the standart without trailing dots! it appeared with rfc 1034 in 1987, it is cited in the second link, and i cite it:

Since a complete domain name ends with the root label, this leads to a
printed form which ends in a dot.  We use this property to distinguish between:
- a character string which represents a complete domain name
 (often called "absolute").  For example, "poneria.ISI.EDU."
- a character string that represents the starting labels of a
 domain name which is incomplete, and should be completed by
 local software using knowledge of the local domain (often
 called "relative").  For example, "poneria" used in the
 ISI.EDU domain.

rfc 1034 (of 1987) just declared all domains which were used, seems they all were without trailing dots, declared them all as becoming relative domains! but they still worked as before, so probably few people knew out about that, and continued to think that they are unambiguously requesting a unique real "example.com" site when they use "example.com" without trailing dot. so that has become an additional security breach in some cases: famous real example.com could be spoofed by a subdomain administrator even if he was not given rights to make any local domain like "localhost.". so, rfc 1034 also was not designed very well: seems its authors did not expect that maybe it will be {not widely known, so creating security breach}!

probably rfc 1738 (1994) tried finally to bring the idea of distinction between absolute and relative domains to wide audience and also fix that security breach after 6 years, {but by fixing the security breach by disallowing relative domains in urls it made relative domains useless, {but i think they probably were not used widely, probably only in some big companies}}. so, what would be [left] in result of rfc 1737, if it would be obeyed? - 1) relative domains declared in 1987 would become finally useless, so, trailing dot, designed to show absolute domain, also would become finally useless and redundant "legally" ie as defined by the rfcs! (but maybe they planned later re-allow relative domains in urls after many years, when wide audience (general public) start to know about the possibility of relative domains). 2) and rfc 1737, if it was obeyed, would also fix the security breach. - but even rfc 1034 would not create the security breach if it reached masses and it was widely understood that using relative domain is not safe! - so, main recipe to fix it was reaching the wide audience, and publishing one more rfc was just one of many ways to do it.

i think now that probably the relative domain feature has not become widely known after rfc 1034 (of 1987) because it was of too limited use: only in some big companies or providers' local networks, and it was a feature with no practical value, because local networks could already make any local domain, so that feature was just for itself, it was in fact just a useless text in rfc that anybody should know and use without having any additional benefit! but people created the little security breach by widely ignoring the rfc, while browsers started to obey it.

i checked the relative domains feature yesterday, it works. (it is ok, because rfc 2396 (of 1998) re-allowed it after rfc 1034 (of 1987) denied, and later rfc 3986 (of 2005) still allows them). i added dns suffix in windows 10 - control panel - ... - network device properties - ipv4 properties - additional - dns tab. when i added "google.com" then opened "http://mail/" in firefox, it opened google's server, but it was not configured to work with just "mail" in the http "host" header, so i got something like "404" page.

---

my reply to the text by the second link ( http://www.dns-sd.org/trailingdotsindomainnames.html ):

he also cites the rule in rfc 1738 and says:

Unfortunately, the people implementing web browser clients appeared not to understand what this meant. When you access a web site, the value most web browsers put in the "Host:" field is what the user typed, not what the computer actually ended up using, after applying the DNS user's searchlist to constuct a fully-qualified name from the partial name. For example, here are three different ways the user may refer to the host "www.example.com." ... When sending the "Host:" parameter to the web server, the web browser client puts in what the user typed ("www.example.com.", "www.example.com", or "www") instead of what the client ended up actually looking up in DNS ("www.example.com." in all three cases). ...

-- this is not very true(correct), because rfc 1738 was very strict in this regard, and it disallowed relative domains in all urls, even if it is in browser's address bar, and url itself is the [recommended] way of making any references to sites, even if people write it on paper, so it was not allowed to users to refer to that site in that 3 ways, by rfc 1738, if that users were going to think by it that they used URL!

and seems the author of this text (Stuart Cheshire) did not know about rfc 2396, so this text is outdated.

---

and what is the situation nowadays? rfc 3986 ( https://tools.ietf.org/html/rfc3986#page-21 ) allows referring to absolute domain without trailing dot: it says " The rightmost domain label of a fully qualified domain name in DNS may be followed by a single "." " and that it should be used if it is "necessary to distinguish between the complete domain name and some local domain". i think that due to de facto standarts it is almost never necessary, so wordpress can accept the de facto standart and redirect from address with trailing dot to the address without it.

Last edited 7 years ago by qdinar (previous) (diff)

#10 @qdinar
7 years ago

i said "the relative domain feature ... was a feature with no practical value"
-- i have found some usefulness of it:
1) an easy way to create local short domains (only locally accessible domains), because for example using hosts file requires knowing of ip, and that ip can later change, so hosts file will need to be edited, but this way (relative domains) requires already having dns working, and subdomains configured, while hosts file can be used without global dns, but this way still maybe useful in some circumstances, like in big companies. but, as i know, local short domains, without need to edit ip every time it changes, can be easily created also if local dns is used, - with cname records.
2) if there is a local fake domain configured in hosts file or in local dns server, for example for usage by webmasters to test a new site, they can add a dot to that domain and get another version of same domain, to try to access to it by usual way, if that version with dot also not added as a local fake domain.

-- so maybe domain with trailing dot should not be redirected to version without trailing dot, but left for testig purposes. maybe there should be shown a message to user so that he knows why he is logged off.

-- but there are some practical problems with the existance of versions of the domains with trailing dots:

http://saynt2day.blogspot.sg/2013/03/danger-of-trailing-dot-in-domain-name.html :

If you do not consider the fact that the user can accidentally enter the domain name with a dot at the end, or follow a link received from some "well-wisher" and get on your domain name with the dot at the end, as the result it may lead to unexpected consequences:
1) If the website uses HTTPS, when navigating to the domain name with the dot at the end, the browser will display the warning on untrusted connection.
2) Authentication may be broken, as cookies are usually set for the domain name without a dot at the end. User in this case will be quite surprised why he can’t log in. It is noteworthy, that if you set a cookie for a domain name with a dot at the end, this cookie will not be passed to the domain name without the dot at the end and vice versa.
3) JavaScript on the page may be broken.
4) There may be problems with the caching of website pages (for example, https://www.cloudflare.com/ does not clear the pages cache if domain name has a dot at the end considering it an invalid domain name).
5) If in conditions in the web server configuration you rely on the particular domain name ($http_host in Nginx, %{HTTP_HOST} in Apache) without the dot at the end, you may face a variety of unexpected situations: unexpected redirects, basic-authorization problems, etc.
6) If the web server is not configured to accept requests on the domain name with the trailing dot, any user who accidentally typed a domain name with the trailing dot will see something like Bad Request - Invalid Hostname.
7) It is possible that search engines may find that your resource has a duplicate content, if someone accidentally or intentionally post links to your web pages with a dot at the end of the domain name.

-- and i think, it is generally a trick, because general public does not know about it, and as a trick it may harm, so maybe that trailing dots should better be denied and turned off in software. in the other hand it may help someone to know out that he used a fake domain (if version with trailing dot is not also faked), but in the other hand it is also a complication that may bring problems to programmers. seems bad sides are more.

Last edited 7 years ago by qdinar (previous) (diff)

#11 @adnan.limdi
6 years ago

I face This issue in my site

#12 @qdinar
4 years ago

i said:

i have found out that the trailing dot, ..., really appeared only after the standart without trailing dots! it appeared with rfc 1034 in 1987

i have now searched for it, to provide easier link and quotes. i googled "rfc 1034", went to https://tools.ietf.org/html/rfc1034 , and there is written :

Obsoletes: RFCs 882, 883, 973

and i opened 3 pages, https://tools.ietf.org/html/rfc882 , https://tools.ietf.org/html/rfc883 , https://tools.ietf.org/html/rfc973 . and i see that their dates are November 1983, November 1983, January 1986. first 2 of them does not have any trailing dots, as i have seen. but third has trailing dots.

so, i was wrong saying "it appeared with rfc 1034 in 1987". trailing dots appeared almost 2 years earlier in rfc 973.

example quotes from rfc 882:

For example, A.B.C.D is a subdomain of B.C.D, C.D, D, and " ".
This domain tree has the names " "(the root), COLORS, RED.COLORS, BLUE.COLORS, GREEN.COLORS, FLAVORS, NATURAL.FLAVORS, CHOCOLATE.NATURAL.FLAVORS, VANILLA.NATURAL.FLAVORS, STRAWBERRY.NATURAL.FLAVORS, and TRUTH.
F.ISI.ARPA A IN 10.2.0.52
The F.ISI.ARPA name server has authority over the ARPA domain, but delegates authority over the MIT.ARPA domain to the name server on AI.MIT.ARPA.

example quotes from rfc 883:

A name server for F.ISI.ARPA , serving as an authority for the ARPA and ISI.ARPA domains, might use a boot file and two master files.
B.ISI.ARPA 9999999 IN A 10.3.0.52
10.IN-ADDR PTR IN MILNET-GW.ISI.ARPA

example quotes from rfc 973, there are both versions, without and with trailing dot:

A second difficulty is the restriction that * match a single label. Thus if a name server is looking for RRs for the name A.B.C.D.E.F, it must check for *.B.C.D.E.F, *.*.C.D.E.F, *.*.*.D.E.F, etc.
(e.g. if you can't find an answer at F.ISI.ARPA, look for a RR at *.ISI.ARPA)
ISI.EDU. 10000 NS X.ISI.EDU.
99.128.IN-ADDR.ARPA. 2000 NS Q.ISI.EDU.
Q.ISI.EDU. 2000 A <address of Q.ISI.EDU.>

Note: See TracTickets for help on using tickets.