WordPress.org

Make WordPress Core

Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#11939 closed feature request (wontfix)

Make Script Elements HTML4 and XHTML compilant

Reported by: hakre Owned by:
Milestone: Priority: low
Severity: normal Version: 3.0
Component: Validation Keywords:
Focuses: Cc:

Description

Following the best practised standards script elements should be compatible with HTML4 and XHTML because often XHTML documents are parsed as HTML but might be parsed as XML as well. So both standards should be taken into account when delivering XHTML documents like WordPress does.

The current output is not XML compilant because script elements are #PCDATA blocks, not #CDATA blocks.

Attached you find a patch that properly escapes #CDATA / #PCDATA script-element contents.

Attachments (4)

11939.patch (70 bytes) - added by hakre 4 years ago.
Update patch (whole core codebase excluding two Externals: SimplePie and Snoopy
11939-akismet.patch (70 bytes) - added by hakre 4 years ago.
Done while taking care. Might be of use for the Plugin then.
11939.2.2.patch (196 bytes) - added by hakre 4 years ago.
STYLE elements added
11939.2.patch (196 bytes) - added by hakre 4 years ago.
STYLE elements added

Download all attachments as: .zip

Change History (20)

comment:1 hakre4 years ago

Elements are missing, I'll update this soon, currently taking care.

comment:2 miqrogroove4 years ago

Does this also affect STYLE elements?

comment:3 hakre4 years ago

You're totally right, it does affect STYLE elements as well, that's true. But this one took me some how longer than expected so I limited it to scripts first.

comment:4 hakre4 years ago

Style elements added, I posted about the xhtml code smells on my blog as well for some regexes which might come in handy to search for such issues.

comment:5 nacin4 years ago

  • Milestone changed from 3.0 to Future Release

comment:6 hakre4 years ago

  • Milestone changed from Future Release to 3.0

It's the same for HTML 5 which should be supported for Twenty Ten as well. Since this is both affecting the frontend and backend, and this has patch, why punt?

comment:7 follow-up: azaozz4 years ago

  • Milestone changed from 3.0 to Future Release
  • Priority changed from normal to low
  • Type changed from defect (bug) to feature request

This is mostly a theoretical situation. I don't see what is fixed here.

WordPress admin is served with XHTML 1.0 Transitional doctype. Don't think there are any browsers currently in use that don't understand this doctype and would parse it as HTML 4 for example (and if there are any I think they would have much bigger problems than inline JS escaping).

Don't see why we should be adding additional escaping for doctypes we don't support in the first place. That would use a little bit of memory and would increase the overall size a little and we already have problems there.

comment:8 in reply to: ↑ 7 hakre4 years ago

  • Milestone changed from Future Release to 3.0
  • Priority changed from low to normal
  • Type changed from feature request to defect (bug)

Replying to azaozz:

This is mostly a theoretical situation. I don't see what is fixed here.

Fixed is: Compability of the webapplications output with the client software (Compbaility of WordPress and the Webborwer).

[...] Don't think there are any browsers currently in use that don't understand this doctype and would parse it as HTML 4 for example

Okay, I see you have not understood

a) why this fix has been provided and
b) what this fix is addressing.

I'll revert your changes to this tickets properties right away so your misleading conclusions do not hurt it. I hope you can live with it.

Not all Browsers in use with WordPress (as by the docs) do fully support displaying XHTML as XML but do display XHTML documents as HTML.

The following Problem is addressed:

 * <script> and <style> elements in XHTML sent as text/html have to be
   escaped using ridiculously complicated strings.

As you can see that is an issue for XHTML documents. And yes, todays browsers in use with WordPress might parse XHTML as HTML and some might parse it as XML - but only under certain conditions. I do not really know, if it's at all the issue that any browser in use with WordPress will display the backend as XML.

This partially covers a more general problem that is related to the XHTML doctype as well. I add it here, so you can better understand which problems you have to deal with when sending out XHTML documents as text/html:

 * Current browsers are, for text/html content, HTML4 user agents (at
   best) and certainly not XHTML user agents. Therefore if you send
   them XHTML you are sending them content in a language which is not
   native to them, and instead relying on their error handling. Since
   this is not defined in any specification, it may vary from one user
   agent to the other.

Reference: Sending XHTML as text/html Considered Harmful by Ian Hickson (Currently editor of HTML 5).

Background information about Firefox Parsers Status:

Firefox Nightly builds ship with the HTML5 parser since May 13th.

Priorior to that, Firefox parsed XHTML documents serverd as text/html (as done by wordrpress if my memory serves me well) with the previous HTML parser. The actual HTML version is 4.

comment:9 nacin4 years ago

  • Milestone changed from 3.0 to Future Release
  • Priority changed from normal to low
  • Type changed from defect (bug) to feature request

hakre4 years ago

Update patch (whole core codebase excluding two Externals: SimplePie and Snoopy

hakre4 years ago

Done while taking care. Might be of use for the Plugin then.

hakre4 years ago

STYLE elements added

comment:10 follow-up: hakre4 years ago

  • Keywords has-patch removed
  • Resolution set to wontfix
  • Status changed from new to closed

hakre4 years ago

STYLE elements added

comment:11 hakre4 years ago

Related: #13383

comment:12 in reply to: ↑ 10 ; follow-up: azaozz4 years ago

Replying to hakre:
I understand your concern, that's why I didn't close this ticket. But unless there is a current browser that is having a problem parsing XHTML 1.0 transitional, don't think we need to change this. That might happen as browsers start to support HTML 5.0 although I think the chances are very slim that they will stop supporting XHTML 1.0.

As far as I remember the need to escape the content of <script> blocks came many years ago in HTML 2.0 (or was it HTML 3.0). It was needed to hide these blocks from browsers that didn't support JS at the time and would display it. Since then all browsers understand the <script> tag and would run the JS instead of trying to parse it as HTML.

comment:13 in reply to: ↑ 12 hakre4 years ago

Replying to azaozz:

But unless there is a current browser that is having a problem parsing XHTML 1.0 transitional, don't think we need to change this.

Looks like I couldn't make my point here again. Please take a little time and read the following paragraphs. At the End I'll ask a question with the aim to point into the domain this should be about, because there are Problems with todays browsers. HTML5 infact is there to make the current situation better. But that as sidenote only.

So let's taking it to the bare metal with a quiz at the end:


Let's open the dashboard in current trunk which is in the admin. So what is this actually for a type of document? To find out, let's ask the browser. I use firefox 3.latest over here (stable). I open the Pageproperties and get his information:

Name: Content-Type
Content: text/html; charset=UTF-8

So we have a HTML document here by the Mimetype (Meta). The same goes for the corresponding Server Resonse Headers:

Name: Content-Type
Content: text/html; charset=UTF-8

To summarize the information so far: The WordPress Admin Interface is served as a text/html document. The charset is UTF-8.

What does that mean? The meaning of the Internet Media Type text/html is defined in HTML4 and RFC2854 respectively. So this can be read in those documents.

That for the Document's Internet Media Type of which the Browser takes into account as HTTP User Agent. But there is more.

Now for the HTML Doctype the Dashboard has, those are the first two lines of the HTTP Resonse Body from the Dashboard:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"  dir="ltr" lang="en-US">

Like you already wrote, it is XHTML 1.0 Transitional. XHTML is "A Reformulation of HTML 4 in XML 1.0".


We've come to an end and found out about with which sepcifications the WordPress Admin gets served. So far so clear.

And now for the Quiz:

Taking Internet Media Type and Doctype into account, what do you think as what will the Dashboard Document be parsed by your HTTP Client? As HTML as for the Mimetype or as XML as for the Doctype?

[ ] A - XML  (~ 1998)
[ ] B - HTML (~ 1990)
[ ] C - SGML (~ 1986)
[ ] D - GML  (~ 1975)
[ ] E - GC   (~ 1969)

(You can, but must not, select multiple answers)

comment:14 azaozz4 years ago

@hakre thanks for the lengthy explanations however it seems we are still talking about different things. I still don't see what bug/problem is fixed in the patches (or rather was fixed since you removed them).

HTML 4 does not require script blocks to be hidden in HTML comments. It recommends a hack to hide scripts from old browsers (old at the time of publishing the spec, completely extinct now). More thoughts on that subject: 1, 2 (last paragraph).

XHTML 1 requires scripts to be surrounded by <![CDATA[...]]>. To avoid problems with older browsers/JS parsers it is recommended that the CDATA tags are commented out in the JS. This is what WordPress is using at the moment.

comment:15 hakre4 years ago

Yes, and that's what this is exactly is about: CDATA and the comments.

That's why I originally made that patch. When it comes to compatibility, usability and validity there is pretty much only that akward variant if you use the HTML mimetype - which WordPress is using at the moment.

I consider it as best practice, but naturally it's completely free to adopt to that or not.

<style type="text/css"><!--/*--><![CDATA[/*><!--*/
...
/*]]>*/--></style>

<script type="text/javascript"><!--//--><![CDATA[//><!--
...
//--><!]]></script>

HTML 2.0 rules btw.

comment:16 nacin4 years ago

  • Milestone Future Release deleted
Note: See TracTickets for help on using tickets.