WordPress.org

Make WordPress Core

Opened 6 months ago

Last modified 6 months ago

#46886 new defect (bug)

wp_targeted_link_rel adds the rel attribute when the link has data-target=""

Reported by: jakeparis Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 5.1
Component: Formatting Keywords: has-unit-tests
Focuses: Cc:
PR Number:

Description

To reproduce:
Create a link element (easiest to see bug in classic editor):

<a href="https://example.com" data-target="thisandthat">click here</a>

Save the post, and the link will have the rel="noopener noreferrer" attribute.

Attachments (2)

46886.diff (52.8 KB) - added by birgire 6 months ago.
46886-2.diff (1.3 KB) - added by birgire 6 months ago.

Download all attachments as: .zip

Change History (8)

#1 @SergeyBiryukov
6 months ago

  • Component changed from General to Formatting

@birgire
6 months ago

#2 @birgire
6 months ago

  • Keywords has-unit-tests added

Thanks for the ticket @jakeparis

46886.diff adds a failing unit test for this case.

Skimming through the regex '|<a\s([^>]*target\s*=[^>]*)>|i', within wp_targeted_link_rel():

https://core.trac.wordpress.org/browser/tags/5.1.1/src/wp-includes/formatting.php#L3041

it seems to allow the target attribute to be part of another string, thus picking up data-target as a false positive.

But at the moment it seems tempting to just replace:

stripos( $text, 'target' ) !== false

with:

stripos( $text, ' target' ) !== false

within wp_targeted_link_rel() here:

https://core.trac.wordpress.org/browser/tags/5.1.1/src/wp-includes/formatting.php#L3040

At least the unit tests in Tests_Targeted_Link_Rel seems to like it :-)

Last edited 6 months ago by birgire (previous) (diff)

@birgire
6 months ago

#3 follow-up: @birgire
6 months ago

46886-2.diff contains the unit-test in 46886.diff and the modification mentioned above, that makes sure the space prefixed target attribute: ' target', is contained in the string before passing it into the regex callback.

#4 in reply to: ↑ 3 @birgire
6 months ago

Replying to birgire:

46886-2.diff contains the unit-test in 46886.diff and the modification mentioned above, that makes sure the space prefixed target attribute: ' target', is contained in the string before passing it into the regex callback.

This can still pass through false positives so we could look at the regex closer.

#5 @birgire
6 months ago

For the background story, this was added in #43187.

By pre-checking for ' target' in 46886-2.diff, the following will not go to the regex callback:

<a href="https://example.com" data-target="thisandthat">click here</a>

but this would:

<a href="https://example.com" data-target="This is targetting all">click here</a>

One could additionally adjust the regex from:

|<a\s([^>]*target\s*=[^>]*)>|i

to:

|<a\s+((?:[^>]*?\s+)?target\s*=[^>]*)>|i

to account for the white space in front of the target attribute. Here (?:) means a non-capturing group. This answer by plalx was helpful. Tested \s instead of \s+ in the changed regex for few cases without problem.

But this will not stop edge cases, where ' target=' is part of the attribute value, like:

<a href="https://example.com" data-target="This is target=One">click here</a>

from getting the rel part added.

We note that target=_blank, without " or ', is matched by the current regex, so it could make solving the edge case more involved.

So at minimum, I think we could consider adding the extra space in the pre-check, to reduce the false positives.

Changing an existing regex should be approached with care, but if we go that route it would most likely require some extra unit-tests.

One could also look further into checks within the callback.

Last edited 6 months ago by birgire (previous) (diff)

#6 @birgire
6 months ago

I noticed a shortcoming in 46886-2.diff that the pre-check of ' target' will not pick up tabs and other non-space whitespaces in front of the target attribute.

The current pre-check of '<a ' has similar shortcomings, as it only assumes a space after the <a part.

A workaround for that could be to precheck 'target' and '<a' and capture the first whitespace group with:

|<a(\s+)((?:[^>]*?\s+)?target\s*=[^>]*)>|i

and adjust the regex callback accordingly.

Note: See TracTickets for help on using tickets.