Make WordPress Core

Opened 15 years ago

Closed 13 years ago

Last modified 9 years ago

#10550 closed defect (bug) (fixed)

nofollow attribute added to comment_reply_link function

Reported by: seo-dave's profile seo-dave Owned by:
Milestone: 3.1 Priority: normal
Severity: minor Version:
Component: Comments Keywords: needs-patch
Focuses: Cc:

Description

Wasn't sure if to list this as a defect or enhancement, it's a defect IMHO.

Within wp-includes/comment-template.php there are nofollow attributes added to the "reply to comment" links.

For search engine ranking reasons this is not a good idea. Since I've had trouble explaining this to non-SEO's (I'm a search engine optimization consultant) I'll explain the problem in detail also see http://wordpress.org/support/topic/287704?replies=1 and http://codex.wordpress.org/Talk:Template_Tags/comment_reply_link.

The original use of nofollow was to stop PR/link benefit passing through a link in effect saving the PR/link benefit. For WordPress that meant you could by default add nofollow links to commenter's links and to some degree protect WordPress blog owners from link comment spammers.

Google has recently reported http://www.mattcutts.com/blog/pagerank-sculpting/ that they now treat nofollow links differently, rather than protecting the PR/link benefit of a nofollow link, they delete the PR/link benefit!

What this means to the average WordPress blog owner is a LOT of PR/link benefit is lost through nofollow links and a lot of this could be through the relatively new reply to comment links.

If you have a highly commented site and have WordPress set to have 50 comments per page, that's 50 reply to comment links with nofollow added. Since Google now deletes the PR/link benefit that would normally go through those links (if it lacked nofollow) a page/site could easily loose over half it's PR/link benefit just because of implementing the reply to comment links!

For those who don't understand PR/link benefit: PR/link benefit is shared equally through all links (internal and external) from a page, so if there are in total 100 links on a page and 50 are nofollow, 50% of the PR/link benefit from that page will now be deleted by Google, it used to go to the non nofollow links!

In a perfect world we'd no longer use nofollow anywhere, unfortunately I don't have an easy to implement solution for this problem regarding commenter's links, but there is one for the nofollow attribute on the reply to comment links and that is remove the nofollow attribute or at least make it easier to remove at theme level (I understand there's a way, but I couldn't find it/figure it out).

David Law

Change History (20)

#1 @dd32
15 years ago

perhaps noindex would suit better.

#2 follow-up: @seo-dave
15 years ago

noindex wouldn't be suitable since you do want the main content of what the reply to links point to, yes the content is theoretically duplicate, (same page with a different URL) the major search engines normally combine the results of duplicate URLs, meaning it's not a problem to have these links without a nofollow attribute. I've had it running this way on one of my sites for well over a month and some of the pages have 100s of comments and the site is doing very well in Google (2,000+ unique visitors a day, most from Google).

The site in question has about 2,000 post with about 650 pages indexed in Google. The site in total has 2,600+ approved comments so that's a LOT of those reply to links. A search of the 600+ indexed pages turns up three pages that are indexed with the reply links, they have the url structure with the replytocom variable within it, though all three pages also are indexed normally and checking the main search phrase (title of the page) finds the original pages listed in Google, not the extra pages. For one of the pages the url corresponded to a paginated comments pages as well, so like this

domain.com/post-name.html/comment-page-2?replytocom=12345

So on my site Google is combining 2,600+ links into the original URLs and only partially failing in three instances.

From an SEO perspective there is no harm in principle in having multiple links back to the page the links are on, so having 50+ reply to comment links isn't a problem in itself

Although this isn't really the place to discuss this: I've done a lot of SEO research over the last 10 years or so and the anchor text of links is considered much more important than standard body text, so having 50+ links with the anchor text "reply to this comment" is potentially damaging from an SEO perspective since it "waters down" the SEO benefits of the other anchor text on the page (which hopefully will have some keywords in them). This is true whether a nofollow attribute is used or not since the anchor text is indexed either way. In an ideal situation all anchor text would support the SERPs of the page the links are on, this is why I would like to edit the Reply to comment text and change it to something like

"Reply to this comment on post name"

By doing this we've added relevant keywords to that anchor text, that is assuming the title of the post contains relevant keywords. I've done this with other WordPress functions and it works quite well.

David Law

#3 @filosofo
15 years ago

One way to address this would be to make the reply "links" into separate POST-submitting forms. For those using JavaScript, the experience would be the same: instead of an inline click event listener, an inline submit listener would move the form.

For the case in which JavaScript doesn't stop the event, the logic would just have to check $_POST instead of $_GET.

In addition to side-stepping the PageRank issues, this approach would have the positive side effect of making things more inline with HTTP protocol, for which GET requests are supposed to be idempotent. (I realize WP admin violates this all over the place, but WP does usually try to do better on the public-facing side of things.)

I'll write a patch to that effect if no one brings any serious objections.

#4 @seo-dave
15 years ago

Any news on this?

If I understand filosofo suggestion it sounds spot on. The current text link with rel="nofollow" (still exists in 2.8.4) would be replaced with a post form.

In this way search engines no longer see the Reply links as links since search engines don't index post submitting forms.

I've used post forms to solve this nofollow problem with author links on comments at theme level. Also used post forms for the login links as there's no good reason for a search engine to follow them. Can be seen in action at http://www.google-adsense-templates.co.uk/

David Law

#5 @seo-dave
14 years ago

This is still an issue in WordPress 2.9.2.

I guess WordPress development are not seriously concerned with the potential SEO damage of having hundreds of link benefit destroying nofollow links on a WordPress blog!!

Remember when it comes to Google every nofollow links counts as a link, it 'uses' link benefit (to be precise the link benefit of a nofollow link is DELETED) that could be used on internal links etc...

WordPress users are bending over backwards gaining links to their sites for SEO reasons and a simple to fix nofollow issue is left as is damaging their hard work.

Not impressed at all!

David Law

#6 @nacin
14 years ago

  • Keywords needs-patch added
  • Milestone changed from Unassigned to Future Release

#7 @mrmist
14 years ago

Related #11359

#8 @scribu
13 years ago

  • Milestone changed from Future Release to 3.1

It's really easy to remove the nofollow attribute, without modifying Core. Just put the following code in your theme's functions.php or in a plugin:

function remove_reply_link_nofollow($reply_link) {
  return str_replace(" rel='nofollow'", '', $reply_link);
}
add_filter('comment_reply_link', 'remove_reply_link_nofollow');

That said, I agree that a nofollow attribute doesn't make sense here, since it's an internal link, especially since we have canonical URLs now.

#9 @scribu
13 years ago

  • Resolution set to fixed
  • Status changed from new to closed

(In [16230]) Remove nofollow on comment reply links. Fixes #10550

#10 @scribu
13 years ago

One way to address this would be to make the reply "links" into separate POST-submitting forms. [...]
In addition to side-stepping the PageRank issues, this approach would have the positive side effect of making things more inline with HTTP protocol, for which GET requests are supposed to be idempotent.

I disagree. You're just asking for the form for replying to a certain comment. You haven't POSTed anything yet.

That said, a form sounds like a good idea.

#11 @scribu
13 years ago

Although I do remember a certain post on Matt Cutt's blog saying something along the lines of Google starting to look through forms too (not sure if just to find more URLs to crawl or also affecting the ranking).

Last edited 13 years ago by scribu (previous) (diff)

#12 follow-up: @joliss
13 years ago

  • Cc joliss42@… added

Since the patch was applied, Googlebot and friends have started crawling pages that end like "/?replytocom=72" on my blog.

I see the reasoning for removing nofollow, but I'm not convinced that having lots of duplicate pages is so great, either. (For example I'm worried it might reduce the crawling frequency for my posts, because now they get drowned out by the replytocom pages.)

So I'd like to politely suggest that there might be some better alternative (the POST form that filosofo suggested, perhaps?).

#13 in reply to: ↑ 12 @joelhardi
13 years ago

I also noticed this a while ago. Although Google says there's no search penalty for having duplicate links like this, personally I'd rather Google not hit my server crawling these pages. I'd recommend adding this line to your robots.txt:

Disallow: /*/?replytocom=

And, while you're at it:

Disallow: /*/trackback/

#14 follow-up: @solarissmoke
13 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

See #16709, and also #16881. Removing the nofollow attributes causes search engine bots to access a whole lot of ?replytocom= pages. Note that duplicate indexing is not the issue - this is addressed by rel=canonical links. The problem is the significant overhead created by bots visiting all these additional pages - one for each post on which comments are open.

Reopening to facilitate discussion.

Last edited 13 years ago by solarissmoke (previous) (diff)

#15 in reply to: ↑ 14 ; follow-up: @joelhardi
13 years ago

See my comment 9 on #16881 ... I don't think you've got the distinctions between the robots exclusions standard (robots.txt and meta tag), canonical and nofollow not quite right.

  • in the nofollow spec, it's explicitly stated that it's not meant to stop search engines from indexing, so site owner's shouldn't use it for that. Search engines can crawl these URLs, the nofollow is just meant to be taken as advice to search engines on link value. Matt Cutts and others have also said not to misuse it for internal links, which brings us back to why seo-dave opened this ticket to start with.
  • canonical affects if and how search engines combine the rankings of specific URLs, but it is up to their discretion how they use it. (Because if you think for 5 seconds like an "evil SEO" you can imagine all sorts of nefarious ways to misuse it if search engines took it literally.)

I feel that this bug was correctly closed, because WordPress was misusing the nofollow attribute for internal links, which had bad consequences, and r16230 corrected it. I filed #16881 with a trivial patch to hit a few places that r16230 missed.

So, now the problem is that we have the bad side effects that people's ?replytocom pages are (a) being crawled more and (b) being indexed more.

Is (a) a problem? Well, crawlers like Googlebot generally are smart enough not to overload a site, so maybe not a big one. But, it definitely is something where you would want to advise spiders not to crawl these URLs, wasting everyone's cycles.

Is (b) a problem? Yes, because these are non-canonical pages and we don't want them indexed or linked to externally (and becoming de-facto canonical). The canonical link tag may mostly reduce this problem but not totally solve it.

The answer to both (a) and (b) is to apply the robots exclusion standard. By stopping crawling, you stop indexing.

OK, so robots.txt is the obvious way to solve this, add the one line I posted earlier and you've fixed it for your site.

But, what about WordPress, for people who don't know about robots.txt? Some possible options:

  • manipulate robots.txt in the filesystem (like .htaccess). Or, when there is no robots.txt and URLs rewrite to index.php, WordPress could return a robots.txt.
  • add the robots meta tag with "noindex,nofollow" to the ?replytocome pages. See my 2nd patch to #16881. This is an easy win, but these URLs will still be hit (but not indexed) since the page has to be at least partly downloaded for the tag to be read. Possibly not hit as often though, once Googlebot takes into account that they have "noindex" set.
  • I can't think of any third options I like. Javascript-only links, user-agent detection to not show bots bad things, and using POST to spoof links which makes me shudder. 307 redirect. Cookie. #!. All incomplete and complicated. Maybe someone else can come up with something?

OK, end of brain dump. I think the robots meta tag (see #16881) is such an easy win that it makes sense. It solves the indexing problem, and once these URLs don't show up anymore when people log into Google Webmaster Tools or whatever, I think the visibility of this issue goes way down. People assume that Googlebot hitting these URLs is bad, but if it's happening infrequently and at 3 a.m., does it really matter?

#16 in reply to: ↑ 15 ; follow-up: @solarissmoke
13 years ago

Replying to joelhardi:

I don't think you've got the distinctions between the robots exclusions standard (robots.txt and meta tag), canonical and nofollow not quite right.

Yup, I was wrong about that. As things stand currently, duplicating indexing is a problem, which will be easily solved by adding a robots meta tag as you have proposed.

People assume that Googlebot hitting these URLs is bad, but if it's happening infrequently and at 3 a.m., does it really matter?

IMHO we cannot assume that it will be infrequently and at 3am. Not all bots are as efficient and well behaved as Googlebot. However I do accept that there is no easily solution to this, and maybe we just have to live with it.

At any rate the robots meta tag needs to be added. Possibly to 3.1.1 as well as 3.2, because it will be causing duplicate indexing.

#17 in reply to: ↑ 2 @hakre
13 years ago

Replying to seo-dave:

[...] Although this isn't really the place to discuss this: [...]
In an ideal situation all anchor text would support the SERPs of the page the links are on, this is why I would like to edit the Reply to comment text and change it to something like

"Reply to this comment on post name"

By doing this we've added relevant keywords to that anchor text, that is assuming the title of the post contains relevant keywords. I've done this with other WordPress functions and it works quite well.

Thanks for sharing the idea, it's out of scope of this ticket as you wrote but I like the idea.

#18 in reply to: ↑ 16 @hakre
13 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

Replying to solarissmoke:

Replying to joelhardi:

I don't think you've got the distinctions between the robots exclusions standard (robots.txt and meta tag), canonical and nofollow not quite right.

Yup, I was wrong about that. As things stand currently, duplicating indexing is a problem, which will be easily solved by adding a robots meta tag as you have proposed.

Well actually not. Duplicate indexing is handeled pretty well by search engines, that's part of their job. I think you're relying to some SEO Myth and not knowing what duplicate content really is about.

So please step back with the SEO discussion which in detail is subject to change and short-term and concentrate on providing correct links.

This ticket was about removing the nofollow attribute as it was inappropriate. This issue has been addressed and already fixed.

If you have got an additional issue, please open a new ticket and leave the new ticket id here.

I will close this ticket again as the issue for what it has been originally reported has been fixed. You can copy over your recent comments to the new ticket, it's not a problem to have some duplicate content in tickets over here ;).

#19 @joelhardi
13 years ago

FYI, I've filed #16893 on the "bots are crawling my ?replytocom pages. stop them!" issue.

#20 @chriscct7
9 years ago

#11360 was marked as a duplicate.

Note: See TracTickets for help on using tickets.