Make WordPress Core

Opened 3 years ago

Closed 18 months ago

Last modified 17 months ago

#53290 closed defect (bug) (fixed)

Prevent tagging certain links in comments with rel="nofollow ugc"

Reported by: galbaras's profile galbaras Owned by: jorbin's profile jorbin
Milestone: 6.2 Priority: normal
Severity: normal Version: 5.7.2
Component: Comments Keywords:
Focuses: Cc:

Description

Assuming that user generated content links should not be followed is generally reasonable. However, there are some exceptions, some of which can be overcome by filters, and some, for which there are none.

  1. The function get_comment_author_link() shouldn't modify the author link indiscriminately. Although its output can be filtered, it would be better to prevent link tagging for specific authors. This can be achieved by adding the following code on line 223 of wp-includes/comment-template.php:
if ( empty( $url ) || 'http://' === $url || in_array( get_comment_author( $comment_ID ), apply_filters( 'comment_authors_link_allowed', [] ) ) {
    return = $author;
}
  1. The function wp_rel_ugc() shouldn't modify the author link indiscriminately either. It should get the comment ID and use the comment_link_allowed_authors filter as well to prevent some authors' links from being modified. The top of the function then becomes:
function wp_rel_ugc( $text, $comment_ID ) {
        if ( in_array( get_comment_author( $comment_ID ), apply_filters( 'comment_authors_link_allowed', [] ) ) {
                return $text;
        }
  1. The function wp_rel_callback() should accept a list of allowed domains, which should include the local domain by default. Line 3075 of wp-includes/formatting.php should become:
if ( ! empty( $atts['href'] ) && ! in_array( $atts['href'], apply_filters( 'comment_link_allowed_domains', [ get_option( 'home' ) ) ) ) {

These changes will provide flexibility that's currently missing in the comment link tagging functionality.

Change History (10)

#1 @dankop
3 years ago

Just leaving a comment to support these changes. Internal links shouldn't be nofollow, even if left by a user.

Think of it as me as the author replying to a user in the comments, and pointing them towards a different page on the website. This is an internal link, posted by the author - shouldn't be "nofollow ugs"

Second scenario - a user replies to another user with a link that leads to a different page on the website. Even though this is user generated content, it's still an internal link and shouldn't be nofollow.

#2 @thomasplevy
19 months ago

While working on #56444 I opened a PR that I think addresses issues raised here as well. See https://core.trac.wordpress.org/ticket/56444#comment:8

#3 @thomasplevy
18 months ago

I believe I've tackled points 2 & 3 in conjuction with #56444.

I'm rereading this and trying to wrap my head around point 1:

The function get_comment_author_link() shouldn't modify the author link indiscriminately. Although its output can be filtered, it would be better to prevent link tagging for specific authors. This can be achieved by adding the following code on line 223 of wp-includes/comment-template.php

The suggested added code wouldn't actually do much to allow filtering of the rel attribute.

As the code stands today, comment author links are always allowed (if they're supplied) and they're always rendered with rel="external nofollow ugc". I believe the spirit of this issue would be to create a filterable allowlist of comment authors. Comment authors present in that last would NOT have ugc added to their comment author links.

Additionally, the "external" and "ugc" attributes should not be added by default, instead they should only be added when applicable.

I've added some new code to address this in the way I understand it, see https://github.com/WordPress/wordpress-develop/pull/3825

After rewriting I started to get to work on some tests but I don't see any existing unit tests in place for this function. I assume it's "safe" to add a new file at tests/phpunit/tests/comment/getCommentAuthorLink.php where new tests for this function may be written. I've started baseline tests but ran into a semi-related bug. I'm not sure if I should open a new issue for this or not or simply address the issue:

If you call get_comment_author_link() and pass no $comment_id (and there's no $GLOBALS['comment'] set) the filter on the function's return errors since $comment isn't an object, $comment->comment_ID can't be accessed.

I think it would make sense to have an early return immediately after the first line of the function (if get_comment() returns null). But I'm not sure this is the "standard" convention for a function like this to return empty with no filter in this kind of a scenario. Or maybe this is "known" and "expected" and should be ignored.

In any event, if my solution to point 1 seems sound I'll get some tests written in the next few days.

Thanks

#4 @galbaras
18 months ago

I think your approach should work well.

One point of efficiency may be to use a static variable for the internal hosts list, rather than applying a filter to a function within a function every time wp_internal_hosts() is called.

Optimising even further, the code from wp_internal_hosts() can be used directly in wp_is_internal_link() to set a static $internal_hosts, instead of applying the array_map() on every call.

So something like this:

<?php
function wp_is_internal_link( $link ) {
        static $internal_hosts;

        $link = strtolower( $link );
        if ( in_array( wp_parse_url( $link, PHP_URL_SCHEME ), wp_allowed_protocols(), true ) ) {
                if ( empty( $internal_hosts ) {
                        $internal_hosts = array_map( 'strtolower', apply_filters( 'wp_internal_hosts', array( wp_parse_url( home_url(), PHP_URL_HOST ) ) ) );
                }
                return in_array( wp_parse_url( $link, PHP_URL_HOST ), $internal_hosts, true );
        }

        return false;

}

Looking at wp_allowed_protocols(), that's exactly how protocols are stored, so this should be consistent, as well as efficient.

#5 @thomasplevy
18 months ago

Thanks @galbaras this makes sense to me. I'll update per your feedback. I noticed a few issues while working on this so I paused to open #57615 and I'll return to this shortly.

#6 @jorbin
18 months ago

  • Milestone changed from Awaiting Review to 6.3
  • Owner set to jorbin
  • Status changed from new to assigned

This will be coming in the same commit as #56444 so moving to 6.3

This ticket was mentioned in Slack in #core by jorbin. View the logs.


18 months ago

#8 @jorbin
18 months ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 55289:

Comments: Improve rel attribute usage in comments.

Internal links should be followed and it should be easier to modify other rel attributes on comments. This adds a helper function for determining if a URL is internal and also adds some new filters to make it easy to modify rel attributes in comments.

Props thomasplevy, desrosj, sabernhardt, benish74, samiamnot, galbaras, jorbin.

Fixes #53290, #56444.

#9 @jorbin
18 months ago

  • Milestone changed from 6.3 to 6.2

#10 @SergeyBiryukov
17 months ago

In 55495:

Formatting: Restore consistent quotes in _make_web_ftp_clickable_cb().

After the introduction of _make_clickable_rel_attr() in an earlier commit, the function ended up returning link markup with a mix of single and double quotes.

This commit ensures that _make_web_ftp_clickable_cb() always returns double quotes, restoring consistency with other similar callback functions used by make_clickable():

  • _make_url_clickable_cb()
  • _make_email_clickable_cb()

Follow-up to [55289].

See #53290, #56444.

Note: See TracTickets for help on using tickets.