Make WordPress Core

Opened 13 years ago

Closed 10 years ago

#17632 closed defect (bug) (invalid)

HTML 5 Validation issues (theme independent)

Reported by: amirhabibi's profile amirhabibi Owned by:
Milestone: Priority: normal
Severity: normal Version: 3.1.3
Component: Template Keywords: has-patch HTML5 close
Focuses: Cc:

Description

Wordpress often adds the rel attribute to links.

For example rel="category" or rel="attachment" etc...

Apparently these keywords are not allowed in html 5 :

http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#linkTypes

So the validation fails :-/

These attributes are a bit hardcoded. You can use regex in filters or javascript solutions to remove them, but this is not a good solution.

I think this problem is now a bug, if you consider the importance of code validation.

Attachments (1)

17632.patch (618 bytes) - added by SergeyBiryukov 13 years ago.

Download all attachments as: .zip

Change History (43)

#1 @johnbillion
13 years ago

  • Keywords close added

Does the value of the rel attribute actually have anything to do with validation? The list of existing rel values on the Microformats site does not appear to form part of the HTML 5 spec but acts as a de-facto list, from what I can see. Please correct me if I am wrong.

On a different note, the HTML 5 spec is a long way from being final. Definitely not a good idea to go pre-emptively changing things in core yet.

#2 @amirhabibi
13 years ago

I understand that html5 is not final, but having control over the hardcoded keywords would be nice and will make wordpress validate in html5 without custom fuctions...

#3 @Elpie
13 years ago

  • Keywords needs-patch added; close removed

This issue probably shouldn't be titled "HTML5 Validation" because it relates to ALL validation.

Neither category or attachment are valid microformats, no matter whether you use HTML or XHTML. They never have been. Where WordPress outputs rel="category" it should be rel="tag". While there is currently no microformat for attachments there is case to be made for changing this to rel="enclosure".

rel="enclosure" is still just a draft but its already in use and likely to be approved. It's already recognised in the Atom Syndication Format (http://www.ietf.org/rfc/rfc4287.txt).

#4 @Elpie
13 years ago

  • Cc lynne.pope@… added

#5 @TNLNYC
13 years ago

  • Version changed from 3.1.3 to 3.2.1

In HTML5, the tag element is allowed as a rel property (see http://www.whatwg.org/specs/web-apps/current-work/multipage/links.html#linkTypes ) so it should be rel="tag" and the word category should be removed. This should fix this validation issue.

#6 @SergeyBiryukov
13 years ago

  • Keywords has-patch added; needs-patch removed
  • Version changed from 3.2.1 to 3.1.3

Related: [2164], [4790]

17632.patch removes rel="category", leaving rel="tag".

#7 follow-up: @nacin
12 years ago

Shouldn't categories be rel="tag" then? Categories are a valid taxonomy classification.

#8 in reply to: ↑ 7 @SergeyBiryukov
12 years ago

Replying to nacin:

Shouldn't categories be rel="tag" then?

That's what the patch does.

#9 @WraithKenny
12 years ago

As "tag" represents taxonomy basically, dropping "category" sounds great. If anyone objects, perhaps at least a filter?

#10 follow-up: @pbiron
12 years ago

I opened an bug against the W3C Validator on this (see [1]).

Per the response from the W3C Validator team, all that needs to happen is:

1) someone needs to write a Specification on what @rel='category' means. That spec can be VERY simple, doesn't have to be formal. See [2] for an example of how simple such a spec can be

2) post that spec in a publicly available place...could probably be done as just an extra section to the docs on get_the_category_list () [3].

3) update the Microformats wiki entry for "category" [4] to include a link to that spec in the Specification field

4) remove the Microformats wiki entry for "category tag" (since it really shouldn't be there in the first place)

5) notify the W3C Validator team of the update, by either adding a comment to the bug report I filed [1] or sending email to www-validator @ w3.org; they will then modify validator.w3.org to accept @rel='category'

I would volunteer to do the above but I don't know what @rel='category' is supposed to mean so I couldn't begin to write the spec. If someone else wants to write the text for the spec (i.e.,step 1) and send it to me, I'd be happy to do steps 2-5

[1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=16510

[2] http://microformats.org/wiki/rel-external

[3] http://codex.wordpress.org/Function_Reference/get_the_category_list

[4] http://microformats.org/wiki/existing-rel-values#HTML5_link_type_extensions

Last edited 12 years ago by pbiron (previous) (diff)

#11 follow-up: @WraithKenny
12 years ago

I think most would rather apply Sergey's patch then to have to maintain a spec...

#12 in reply to: ↑ 11 @bamajr
12 years ago

Replying to WraithKenny:

I think most would rather apply Sergey's patch then to have to maintain a spec...

@WraithKenny - I completely agree. Maintaining a spec, for use in HTML5, especially since HTML5 is not anywhere near final, could be a pretty big endeavor.

#13 in reply to: ↑ 10 @bamajr
12 years ago

Replying to pbiron:

You can't just write it... you have to maintain it and then defend it's position when other people want to change it.

For this reason, it really isn't a bug in the W3C Validator (which is actually validador.nu and whatwg). The validator is correct, because without a specification, the use is invalid.

I've been researching this issue for quite a while, as I despise when developers start throwing invalid code in something which is supposed to be a core framework. WordPress is essentially violating their own policy in creating valid code (http://codex.wordpress.org/WordPress_Coding_Standards#HTML). The WordPress Core doesn't validate, so how is the resulting theme going to validate?

I've posted a few follow-up comments, to other support topics at:
http://wordpress.org/support/topic/get_the_category_list-produces-rel-attribute-with-invalid-values?replies=7#post-2720221

and...

http://wordpress.org/support/topic/wordpress-abuses-rel-tag?replies=23#post-2720264

If the WordPress core is going to use rel="category" and rel="category tag" then it should take the steps necessary to ensure they are properly defined in a specification.

#14 @pbiron
12 years ago

[crap, I spent 20 mins composing a reply and Trac just lost it...now I have to redo it :-( Oh, well...this draft is clearer and more complete than my first one anyway]

@bamajr

If the WordPress core is going to use rel="category" and rel="category tag" then it should take the steps necessary to ensure they are properly defined in a specification.

I couldn't agree more and that's what my initial comment [1] was all about.

re:

You can't just write it... you have to maintain it and then defend it's position when other people want to change it.

As I mentioned, the spec for @rel='category' does not have to be a formal spec approved by a standards body (nor need to be defended "when other people want to change it"). It can be as simple as a new section (or even a single sentence :-) in the WordPress docs for get_the_category_list() [2], which have to be maintained along with the code anyway.

As to what @rel='category' means, again, I haven't a clue, but apparently whoever wrote line 163 of wp-includes/category-template.php [3] thought it means something different than @rel='tag':

$rel = ( is_object( $wp_rewrite ) && $wp_rewrite->using_permalinks() ) ? 'rel="category tag"' : 'rel="category"';

and that the difference has "something" to do with whether $wp_rewrite->using_permalinks() returns true or false.

Since, I don't know what revision-control system is used for WordPress core, is there anyway to find out who wrote that line and ask them what it means and have that meaning added to [2]? (And again, I'll do all the rest for getting conforming validators to not flag @rel='category' as a validation error).

re:

I despise when developers start throwing invalid code in something which is supposed to be a core framework.

Again, I couldn't agree more! After all, I am a co-editor of the W3C XML Schema spec [4] and several HL7 specs [5], and I hate it when people deploy illegal XML Schemas in a production system just so that XML Spy's buggy schema implementation will accept them :-(.

In fact, I think a convincing argument can be made that get_the_category_list()'s generation of @rel='tag' is actually what is incorrect (in at least some cases, see below); and hence, that applying the patch would be the wrong thing to do...similar to people deploying incorrect XML Schemas just to get XML Spy's buggy schema implementation to say they are OK).

How do people use get_the_category_list()? I can't speak for others, but I use it like Twenty Eleven does:

$categories_list = get_the_category_list( __( ', ', 'twentyeleven' ) );
<span class="cat-links">
	<?php printf( __( '<span class="%1$s">Posted in</span> %2$s', 'twentyeleven' ),
		'entry-utility-prep entry-utility-prep-cat-links', $categories_list );
</span>

e.g., in a theme's content.php.

Consider the definition of @rel='tag' in the HTML5 spec (I know, it's just a draft and can change at any time, so let's not go there again) [6]:

The tag keyword indicates that the tag that the referenced document represents applies to the current document.

Use of @rel='tag' is appropriate for use on a link within article//span[@class='entry-meta'] when used in {single,category}.php (since the category applies to all posts those templates generate).

However, it is not appropriate when used in content.php (e.g., on a site's home page that contains posts from many different categories), since then it is saying the category applies to the home page, which is certainly not what I intend in that case!

So, irrespective of whether @rel='category' is generated (and/or what it means), I think that what should be changed in the code of get_the_category_list() is that it should allow a parameter to control whether @rel='tag' is generated and that [2] should be clear that that param should only be true when it is known that all posts in the loop have the same category applied to them!

[1] http://core.trac.wordpress.org/ticket/17632#comment:10
[2] http://codex.wordpress.org/Function_Reference/get_the_category_list
[3] http://core.trac.wordpress.org/browser/tags/3.3.1/wp-includes/category-template.php
[4] http://www.w3.org/TR/xmlschema-2
[5] http://www.hl7.org (sorry, the specs themselves are not publicly accessible)
[6] http://www.w3.org/TR/html5/links.html#link-type-tag

Last edited 12 years ago by pbiron (previous) (diff)

#15 @WraithKenny
12 years ago

The code from note (3) was to because the spec for rel-tag disallows query vars ?tag=foo allowing for /tag/foo/ only, hence the using_permalinks check.

7 years ago, when the rel-category and rel-tag where added, it was likely only to match WordPress' Tags and Categories. The rel-tag spec and the html5 spec (which are much newer I think) seem to intend rel-tag to mean "taxonomy" which applies equally to Tags and Categories, which makes rel=category obsolete.

I understand your concern about how rel-tag is used on homepages: the linked tag should not apply to the homepage "document" (in the html document sense). This is how it currently exists "in the wild" though and I think this patch wouldn't really affect that one way or the other.

The issue of this ticket is a somewhat different, and more narrow, thing: the rel-category being obsolete and invalid in html5.

Whether rel-tag is appropriate on the homepage (in html5), likely deserves it's own ticket, and even better, a discussion at the WC3/WHATWG/Microformats.org. For example, perhaps their definition can be adjusted to reflect that multiple html5 article elements (which are distinct by the spec's definition I think) can be present in a single valid html5 document.

#16 @Marventus
12 years ago

@pbiron, I just joined the conversation on the forums about this (http://wordpress.org/support/topic/wordpress-abuses-rel-tag?). You can see my reply there.
Since 'rel="category"' seems to only cause potential problems in HTML5, why not take your parameter idea, which is great, and make it so that its default value is automatic ("auto") and relies on a theme Doctype check which would only output the category value if doctype is not HTML5?

@WraithKenny, I agree with you that the use of the 'rel="tag"' in the home or archive pages should be assigned its dedicated ticket.

Last edited 12 years ago by Marventus (previous) (diff)

#17 @bamajr
12 years ago

@pbiron - I hate it when effort is made to write out detailed info, and then something happens causing the loss of work. I've learned thought, to "Select All" then "Copy" throughout the documentation process.

FYI: Great reconnaissance mission :-)

I did not know the rel="category" would not go through a formal approval process, so you clued me in on a piece of info, I didn't have... Thanks! However, can't anyone update the Microformats Wiki, or at least anyone who is of authority on the website? If this is the case, why hasn't this been done already? Do we even know if it was WordPress who added it?

On my end, I don't really care, so long as it validates and I know several on that list are perfectly valid already, though not rel="category" or rel="category tag"

#18 @pbiron
12 years ago

@Maventus: I replied [1] to your questions in that support thread, before I saw your note here. You've probably already seen it.

@WraithKenny: thanx for the explanation about the relevance of the check for $wp_request->using_permalinks()! That's very helpful background.

However, the rel-tag spec [2] is no longer relevant, at least in the context of HTML5. The fact that @rel='tag' is listed in the table in the formats section [3] of the microformats wiki (with a link to [2]) doesn't make it normative for HTML5, which gives it's own definition for @rel='tag' that supersedes anything else and explicitly references the table in the HTML5 link type extensions section [4] as being normative for values not defined within itself.

re: my concern about @rel='tag' being used for links within a document containing posts from many different categories, I see that the rel-tag spec contains the following language:

Note that a tag may just refer to a major portion of the
current page (i.e. a blog post)

So, with that interpretation (and based on what you said about ?tag=foo was in the mind of whoever wrote the code that generates @rel='tag'), I don't see (much of) a problem with @rel='tag' being used in a document containing posts from many different categories.

But, as I said above, HTML5 does not rely on the rel-tag spec [2] for the semantics of @rel='tag', it defines its own. HTML5 also contains the following non-normative note after it's definition of @rel='tag':

Note: Since it indicates that the tag applies to the current document,
it would be inappropriate to use this keyword in the markup of a tag
cloud, which lists the popular tags across a set of pages.

which is where my trepidation about it's use on a home page came from (and that trepidation extends to by-date or by-author archives and anywhere else posts from multiple categories could appear).

Your point about it's use within the context of article is interesting, since the HTML5 spec says [5]:

The article element represents a self-contained composition in a
document, page, application, or site and that is, in principle,
independently distributable or reusable, e.g. in syndication.
This could be a forum post, a magazine or newspaper article,
a blog entry, a user-submitted comment, an interactive widget
or gadget, or any other independent item of content.

Which, while non-normative, could be used to justify a request to the HTML5 WG that they loosen the definition of @rel='tag' to include not only documents but also such "self-contained compositions...". I'll talk to folks on the W3C HTML5 WG about whether they think that would be appropriate.

But even if they do make such a change (and hence, article//a[@rel*='tag'] having the correct semantics when other article's in the same document are assigned to different categories, my suggestion to parameterize get_the_category_list() still applies because there is nothing to prevent it being called outside of that context.

But your point about opening another ticket about that is well taken and I didn't mean to co-opt this one on that point. But, I'm tired and will open that other ticket tomorrow.

My purpose in what I have said in this ticket (and the support thread [1]) has only been to point out that @rel='category' is not, in your words, "obsolete and invalid in html5", if only it's use were documented and that the patch should not be applied, at least until the semantics of the use of @rel='tag' in a "multi-category" context are clarified: replacing one incorrect usage with another incorrect usage just to get validators to shut up is not the right thing to do.

[1] http://wordpress.org/support/topic/wordpress-abuses-rel-tag?replies=29#post-2721386
[2] http://microformats.org/wiki/rel-tag#Tag_Spaces
[3] http://microformats.org/wiki/existing-rel-values#formats
[4] http://microformats.org/wiki/existing-rel-values#HTML5_link_type_extensions
[5] http://dev.w3.org/html5/spec/single-page.html#link-type-tag

#19 @pbiron
12 years ago

@bamajr: yes, anyone can edit the microformats wiki page and there is revision tracking that shows who made an edit but I haven't checked who made the @rel='category and @rel='category tag entries (or when they were made).

re: "why hasn't this been done already?". That's what I thought the second I learned from the W3C Validator team that the only reason @rel='category' was being flagged as a validation error was because the spec was missing.

p.s. in case I haven't said it before, the @rel='category tag' entry in the wiki is a red hearing and should be removed, since links that use @rel='category tag' actually get interpreted as @rel='category' && @rel='tag'.

#20 @Marventus
12 years ago

@pbiron and @bamajr: Thanks for the great info on your posts and for your insight. I think we all agree now that @rel='category' needs to go.
As for the @rel='tag' issue, after testing the get_the_category_list function output in home, page, post, category archive, date archive, and author archive, it would seem Paul (pbiron) is right about his 'trepidations' concerning incorrect usage of the @rel='tag' based on the specs:
1. Home - Categories of the last post being retrieved (in my case, the oldest);
2. Page - empty;
3. Post - Post Categories;
4. Category Archive - Archive Category;
5. Date Archive - Idem 1. Home;
6. Author Archive - Idem 1. Home;
So, it seems output is correct for cases 2-4 and incorrect for 1,5-6. This should be easily fixed with Conditional Tags.

Last edited 12 years ago by Marventus (previous) (diff)

#21 @Marventus
12 years ago

  • Cc Marventus added

#22 follow-up: @WraithKenny
12 years ago

@pbiron I've learned a lot from this conversation, and indeed, I was hoping to "get validators to shut up" but in a "hit the moving target" of a "living document" kind of a way :) which I still think isn't a terrible thing to do.

The patch doesn't replace an incorrect usage with another, as the rel='tag' on the homepage is currently existing (not added by this patch).

This patch literally just strips out category and doesn't add anything: http://core.trac.wordpress.org/attachment/ticket/17632/17632.patch

I've created #20333 for the rel='tag' issue.

@Marventus I don't think we are all in agreement about @rel='category' though I think a worthy compromise at this point is for it's removal for now. It shouldn't be hard to put it back in if and when a spec for rel='category' is adopted.

For those of you who want to know when and by who the code was written by, SergeyBiryukov found these changesets (comment 6 above): Related: [2164], [4790]

As for this ticket, the rel='attachment' issue hasn't been addressed yet...

#23 @WraithKenny
12 years ago

An alternative path for addressing these issues, is to let it all be easily filterable and let themes/plugins (i.e. users) decide the issue.

#24 in reply to: ↑ 22 ; follow-up: @Marventus
12 years ago

@WraithKenny:

I've created #20333 for the rel='tag' issue.

Cool! I'll repost my findings in there in case they might be useful to someone else.


@Marventus I don't think we are all in agreement about @rel='category' though I think a worthy compromise at this point is for it's removal for now. It shouldn't be hard to put it back in if and when a spec for rel='category' is adopted.

Sorry about that! I thought we were, since the only comment I saw against this was from JohnMillion, who posted 10 months ago. Am I missing someone else?

As for this ticket, the rel='attachment' issue hasn't been addressed yet...

Indeed. However, I don't know how that could be fixed since @rel='attachment' seems to work in combination with 'wp-att-#' in certain contexts, and I believe those values are invalid in HTML5 as well.
Core files related to this issue seem to be wp-includes/post.php (_fix_attachment_links function), wp-includes/media.php (get_image_send_to_editor and media_upload_form_handler functions), and wp-includes/template.php (the_attachment_links function).

Last edited 12 years ago by Marventus (previous) (diff)

#25 in reply to: ↑ 24 @WraithKenny
12 years ago

Also worth noting, the new hawtness is to use data- attributes everywhere, which were designed specifically to replace (supercede) overused (abused) rel and class attributes. Perhaps, going forward, WordPress can switch to using data- attributes?

#26 @Marventus
12 years ago

BTW, should the Version number be changed to 3.4?

#27 follow-up: @pbiron
12 years ago

@Marventus: As WraithKenny said, we are not all in agreement that @rel='category' should be removed: in almost every comment that I've made on this ticket I have said that the patch should not be applied, at least not until the semantics question is resolve.

@WraithKenny: thanx, I somehow missed that earlier comment on this ticket that showed who/when the changes were made. A quick scan of those changes shows [2164] added @rel='category tag' and [4790] removed tag when $wp_write->using_permalinks() returned true.

And thanx for opening the other ticket, I got sidetracked yesterday and didn't get to it myself. I've got some thoughts that I will add to that ticket.

In a couple of my comments on this ticket I have said I don't know what @rel='category' is supposed to mean...but that is only partly true. Here's a guess on how it differs from @rel='tag': category means the term is taken from a controlled vocabulary, while tag means it comes from an uncontrolled vocabulary (however, both the rel-tag spec [1] and HTML5 [2] are almost silent on controlled vs. uncontrolled vocab).

Uncontrolled vs controlled vocab is a big difference, as evidenced by the fact that WordPress's infrastructure goes to great lengths to separate Categories (controlled vocab) from Tags (uncontrolled vocab). However, this difference only "makes a difference" if search engines (or other tools that process @rel) did something different with the link depending on the value of @rel).

Opera used to do something with link[@rel] (it had a "navigation" panel that would appear on docs that used link[@rel='{next,prev,up}', but no longer does) and I'm not aware of any browsers that currently do anything different depending on the value of a[@rel]. But perhaps with HTML5's "formalization" of values for @rel they might start to. Google and other search engines are generally pretty closed mouthed about exactly how they use various markup to effect searches ("trade secrets" and all)...except for perhaps @rel='nofollow'.

My harping on "getting the semantics right" (as opposed to "getting the validators to shutup") really only matters if search engines and browsers treat @rel='category' differently from @rel='tag'.

From a "get the validators to shut up" perspective, a big advantage to documenting what @rel='category' means (and updating the microformats wiki to point to that "spec" so that at least validator.nu, and hopefully other validators, will stop complaining) is that sites that have/do not update WordPress to whatever version would contain the patch removing @rel='category' would also pass HTML5 validation. And we all know that there are innumerable reasons why people do not update to the latest version (incompatibility with "vital" plugins/themes being the big one).

So, whether the patch gets applied or not, I still think a "spec" for @rel='category' should be written.

The above reasoning also applies to @rel='attachment' and any other link relationship types the WordPress core generates.

As I'm relatively new to WordPress, I have one other question: what is the official process for approving changes to core code and/or documentation? Does anyone who has participated in this discussion actually have the authority to make a decision, or is everyone commenting here, like me, just someone who is developing themes/plugins/sites for clients?

[1] http://microformats.org/wiki/rel-tag
[2] http://www.w3.org/TR/html5/links.html#link-type-tag

#28 @pbiron
12 years ago

  • Cc pbiron added

#29 in reply to: ↑ 27 @SergeyBiryukov
12 years ago

Replying to Marventus:

BTW, should the Version number be changed to 3.4?

No. Version number indicates when the bug was initially introduced/reported. I've kept it at 3.1.3, though it could be set to 1.5, since [2164] was committed before that release.

rel='attachment' was added in [3303].

Replying to pbiron:

As I'm relatively new to WordPress, I have one other question: what is the official process for approving changes to core code and/or documentation?

For the core code, the decision is made by the core team leads/developers. The documentation in Codex is open for editing by anyone with a WordPress.org account though (and is overseen by the docs team).

#30 @pbiron
12 years ago

I posed a question on the HTML5 comments mailing list asking about 1) whether @rel='tag' is appropriate for controlled vocabularies and 2) what is the scope of @rel='tag'. We'll see what the response is.

#31 @WraithKenny
12 years ago

  • Keywords HTML5 added

#32 @SergeyBiryukov
12 years ago

Closed #21392 as a duplicate.

#33 @sbrajesh
11 years ago

  • Cc sbrajesh added

#34 @WraithKenny
11 years ago

I think the FAQ over at microformats.org was updated to clarify the tags on the homepage issue. Essentially it says that if some portion of the page is about a tag (a post), its valid to apply the tag to the whole thing (the archive). Effectively the archive is, in some part, about the posts it lists. So that's one issue down.

#35 follow-up: @mcepl
11 years ago

  • Cc mcepl added

Is there somewhere end of this story?

Last edited 11 years ago by mcepl (previous) (diff)

#36 @dEM0nsTAr
11 years ago

  • Cc dEM0nsTAr added

#37 in reply to: ↑ 35 @alexvorn2
11 years ago

Replying to mcepl:

Is there somewhere end of this story?

I think the existing patch is good, and this is a easy fix...
Hope to see this fixed soon.

#38 @MikeHansenMe
11 years ago

The existing patch still works well for me.

Last edited 11 years ago by MikeHansenMe (previous) (diff)

#39 @alexvorn2
11 years ago

please move this to 3.6

#40 @GaryJ
11 years ago

  • Cc gary@… added

#41 @nacin
10 years ago

  • Component changed from General to Validation
  • Keywords close added

Tantek later created a rel-category page: http://microformats.org/wiki/rel-category.

If this validates this can be closed as invalid.

If it does not validate, then the validator should be contacted. Then this can be closed as invalid.

#42 @nacin
10 years ago

  • Component changed from Validation to Template
  • Milestone Awaiting Review deleted
  • Resolution set to invalid
  • Status changed from new to closed

This seems to validate for me now.

Note: See TracTickets for help on using tickets.