Ticket #11939 (closed feature request: wontfix)

Opened 2 years ago

Last modified 2 years ago

Make Script Elements HTML4 and XHTML compilant

Reported by: hakre Owned by:
Priority: low Milestone:
Component: Validation Version: 3.0
Severity: normal Keywords:
Cc:

Description

Following the best practised standards script elements should be compatible with HTML4 and XHTML because often XHTML documents are parsed as HTML but might be parsed as XML as well. So both standards should be taken into account when delivering XHTML documents like WordPress does.

The current output is not XML compilant because script elements are #PCDATA blocks, not #CDATA blocks.

Attached you find a patch that properly escapes #CDATA / #PCDATA script-element contents.

Attachments

11939.patch Download (70 bytes) - added by hakre 2 years ago.
Update patch (whole core codebase excluding two Externals: SimplePie and Snoopy
11939-akismet.patch Download (70 bytes) - added by hakre 2 years ago.
Done while taking care. Might be of use for the Plugin then.
11939.2.2.patch Download (196 bytes) - added by hakre 2 years ago.
STYLE elements added
11939.2.patch Download (196 bytes) - added by hakre 2 years ago.
STYLE elements added

Change History

Elements are missing, I'll update this soon, currently taking care.

Does this also affect STYLE elements?

You're totally right, it does affect STYLE elements as well, that's true. But this one took me some how longer than expected so I limited it to scripts first.

Style elements added, I  posted about the xhtml code smells on my blog as well for some regexes which might come in handy to search for such issues.

  • Milestone changed from 3.0 to Future Release
  • Milestone changed from Future Release to 3.0

It's the same for HTML 5 which should be supported for Twenty Ten as well. Since this is both affecting the frontend and backend, and this has patch, why punt?

comment:7 follow-up: ↓ 8   azaozz2 years ago

  • Priority changed from normal to low
  • Type changed from defect (bug) to feature request
  • Milestone changed from 3.0 to Future Release

This is mostly a theoretical situation. I don't see what is fixed here.

WordPress admin is served with XHTML 1.0 Transitional doctype. Don't think there are any browsers currently in use that don't understand this doctype and would parse it as HTML 4 for example (and if there are any I think they would have much bigger problems than inline JS escaping).

Don't see why we should be adding additional escaping for doctypes we don't support in the first place. That would use a little bit of memory and would increase the overall size a little and we already have problems there.

comment:8 in reply to: ↑ 7   hakre2 years ago

  • Priority changed from low to normal
  • Type changed from feature request to defect (bug)
  • Milestone changed from Future Release to 3.0

Replying to azaozz:

This is mostly a theoretical situation. I don't see what is fixed here.

Fixed is: Compability of the webapplications output with the client software (Compbaility of WordPress and the Webborwer).

[...] Don't think there are any browsers currently in use that don't understand this doctype and would parse it as HTML 4 for example

Okay, I see you have not understood

a) why this fix has been provided and
b) what this fix is addressing.

I'll revert your changes to this tickets properties right away so your misleading conclusions do not hurt it. I hope you can live with it.

Not all Browsers in use with WordPress (as by the docs) do fully support displaying XHTML as XML but do display XHTML documents as HTML.

The following Problem is addressed:

 * <script> and <style> elements in XHTML sent as text/html have to be
   escaped using ridiculously complicated strings.

As you can see that is an issue for XHTML documents. And yes, todays browsers in use with WordPress might parse XHTML as HTML and some might parse it as XML - but only under certain conditions. I do not really know, if it's at all the issue that any browser in use with WordPress will display the backend as XML.

This partially covers a more general problem that is related to the XHTML doctype as well. I add it here, so you can better understand which problems you have to deal with when sending out XHTML documents as text/html:

 * Current browsers are, for text/html content, HTML4 user agents (at
   best) and certainly not XHTML user agents. Therefore if you send
   them XHTML you are sending them content in a language which is not
   native to them, and instead relying on their error handling. Since
   this is not defined in any specification, it may vary from one user
   agent to the other.

Reference:  Sending XHTML as text/html Considered Harmful by Ian Hickson (Currently editor of HTML 5).

Background information about Firefox Parsers Status:

Firefox Nightly builds ship with the HTML5 parser since May 13th.

Priorior to that, Firefox parsed XHTML documents serverd as text/html (as done by wordrpress if my memory serves me well) with the previous HTML parser. The actual HTML version is 4.

  • Priority changed from normal to low
  • Type changed from defect (bug) to feature request
  • Milestone changed from 3.0 to Future Release

hakre2 years ago

Update patch (whole core codebase excluding two Externals: SimplePie and Snoopy

hakre2 years ago

Done while taking care. Might be of use for the Plugin then.

hakre2 years ago

STYLE elements added

comment:10 follow-up: ↓ 12   hakre2 years ago

  • Keywords has-patch removed
  • Status changed from new to closed
  • Resolution set to wontfix

hakre2 years ago

STYLE elements added

Related: #13383

comment:12 in reply to: ↑ 10 ; follow-up: ↓ 13   azaozz2 years ago

Replying to hakre: I understand your concern, that's why I didn't close this ticket. But unless there is a current browser that is having a problem parsing XHTML 1.0 transitional, don't think we need to change this. That might happen as browsers start to support HTML 5.0 although I think the chances are very slim that they will stop supporting XHTML 1.0.

As far as I remember the need to escape the content of <script> blocks came many years ago in HTML 2.0 (or was it HTML 3.0). It was needed to hide these blocks from browsers that didn't support JS at the time and would display it. Since then all browsers understand the <script> tag and would run the JS instead of trying to parse it as HTML.

comment:13 in reply to: ↑ 12   hakre2 years ago

Replying to azaozz:

But unless there is a current browser that is having a problem parsing XHTML 1.0 transitional, don't think we need to change this.

Looks like I couldn't make my point here again. Please take a little time and read the following paragraphs. At the End I'll ask a question with the aim to point into the domain this should be about, because there are Problems with todays browsers. HTML5 infact is there to make the current situation better. But that as sidenote only.

So let's taking it to the bare metal with a quiz at the end:


Let's open the dashboard in current trunk which is in the admin. So what is this actually for a type of document? To find out, let's ask the browser. I use firefox 3.latest over here (stable). I open the Pageproperties and get his information:

Name: Content-Type
Content: text/html; charset=UTF-8

So we have a HTML document here by the Mimetype (Meta). The same goes for the corresponding Server Resonse Headers:

Name: Content-Type
Content: text/html; charset=UTF-8

To summarize the information so far: The WordPress Admin Interface is served as a text/html document. The charset is UTF-8.

What does that mean? The meaning of the Internet Media Type text/html is defined in  HTML4 and  RFC2854 respectively. So this can be read in those documents.

That for the Document's Internet Media Type of which the Browser takes into account as HTTP User Agent. But there is more.

Now for the HTML Doctype the Dashboard has, those are the first two lines of the HTTP Resonse Body from the Dashboard:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"  dir="ltr" lang="en-US">

Like you already wrote, it is XHTML 1.0 Transitional.  XHTML is "A Reformulation of HTML 4 in XML 1.0".


We've come to an end and found out about with which sepcifications the WordPress Admin gets served. So far so clear.

And now for the Quiz:

Taking Internet Media Type and Doctype into account, what do you think as what will the Dashboard Document be parsed by your HTTP Client? As HTML as for the Mimetype or as XML as for the Doctype?

[ ] A - XML  (~ 1998)
[ ] B - HTML (~ 1990)
[ ] C - SGML (~ 1986)
[ ] D - GML  (~ 1975)
[ ] E - GC   (~ 1969)

(You can, but must not, select multiple answers)

@hakre thanks for the lengthy explanations however it seems we are still talking about different things. I still don't see what bug/problem is fixed in the patches (or rather was fixed since you removed them).

HTML 4 does not require script blocks to be  hidden in HTML comments. It recommends a hack to  hide scripts from old browsers (old at the time of publishing the spec, completely extinct now). More thoughts on that subject:  1,  2 (last paragraph).

XHTML 1 requires scripts to be surrounded by <![CDATA[...]]>. To avoid problems with older browsers/JS parsers it is recommended that the CDATA tags are commented out in the JS. This is what WordPress is using at the moment.

Yes, and that's what this is exactly is about: CDATA and the comments.

That's why I originally made that patch. When it comes to compatibility, usability and validity there is pretty much only that akward variant if you use the HTML mimetype - which WordPress is using at the moment.

I consider it as best practice, but naturally it's completely free to adopt to that or not.

<style type="text/css"><!--/*--><![CDATA[/*><!--*/
...
/*]]>*/--></style>

<script type="text/javascript"><!--//--><![CDATA[//><!--
...
//--><!]]></script>

HTML 2.0 rules btw.

  • Milestone Future Release deleted
Note: See TracTickets for help on using tickets.