Make WordPress Core

Opened 4 years ago

Last modified 4 years ago

#50514 new defect (bug)

make_clickable nested links bug

Reported by: elhardoum's profile elhardoum Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 5.5
Component: Formatting Keywords: has-patch needs-unit-tests
Focuses: Cc:

Description

If you look at the source of [make_clickable](https://core.trac.wordpress.org/browser/tags/5.4/src/wp-includes/formatting.php#L2985) there's one last regex call to replace potentially nested links:

// Cleanup of accidental links within links.
return preg_replace( '#(<a([ \r\n\t]+[^>]+?>|>))<a [^>]+?>([^>]+?)</a></a>#i', '$1$3</a>', $r );

From the looks of the expression, this is meant to remove accidental nested links only if the parent link wrapping them does not have any non-link text at the edges. Let me provide a few examples:

  1. This works as intended:
<?php

$text = '<a href="https://w.org">https://w.org</a>';
$click = make_clickable($text);
# <a href="https://w.org">https://w.org</a>
  1. Let's introduce more content inside the link, either prepend or append it to the hyperlink's inner URL:
<?php

$text = '<a href="https://w.org"> https://w.org</a>';
$click = make_clickable($text); # <a href="https://w.org">https://w.org</a>
# <a href="https://w.org"> <a href="https://w.org" rel="nofollow">https://w.org</a></a>

$text = '<a href="https://w.org">https://w.org </a>';
$click = make_clickable($text); # <a href="https://w.org">https://w.org</a>
# <a href="https://w.org"><a href="https://w.org" rel="nofollow">https://w.org</a> </a>

There, I used a simple whitespace for the sake of an example.

Suggested Patch

I am suggesting a simple fix with the cleanup regex expression, although however if you managed to understand the root problem you'd be able to come up with something much better.

<?php

// Cleanup of accidental links within links.
return preg_replace( '#(<a([ \r\n\t]+[^>]+?>|>))(.+)?<a [^>]+?>([^>]+?)</a>(.+)?</a>#is', '$1$3$4$5</a>', $r );

In my patch you'll see the following changes:

  1. Added (.+)? for capturing any leading characters before any nested links
  2. Added (.+)? for capturing any trailing characters before any nested links
  3. Added s modifier so as to have the previous capturers work with newlines
  4. '$1$3$4$5</a>' restoring the captured leading or trailing characters back into the cleaned up HTML.

Also worth noting I am running WordPress 5.4.2, PHP 7.4.4, nginx/1.17.10 on an Alpine Linux docker container (Linux 4.19.76-linuxkit x86_64).

Attachments (1)

patch.php (188 bytes) - added by elhardoum 4 years ago.
Patch for line https://core.trac.wordpress.org/browser/tags/5.4/src/wp-includes/formatting.php#L2986

Download all attachments as: .zip

Change History (5)

This ticket was mentioned in PR #371 on WordPress/wordpress-develop by elhardoum.


4 years ago
#1

Fixes a make_clickable nested links potential bug, as the function cleans up any accidental links it fails to catch all accidental links.

More details in the trac ticket.

Trac ticket: https://core.trac.wordpress.org/ticket/50514

#2 @SergeyBiryukov
4 years ago

Hi there, welcome to WordPress Trac! Thanks for the ticket and the patch.

Just noting some previous changes here:

#3 @SergeyBiryukov
4 years ago

  • Keywords needs-unit-tests added

#4 @cfinke
4 years ago

This would solve a problem that I've encountered in the wild, but it appears that it still has unexpected behavior. For example, with this input:

<a href="http://example.com/">http://example.com/</a><a href="http://example.com/">http://example.com/</a>

the patched make_clickable() produces output with nested <a> tags:

<a href="http://example.com/"><a href="http://example.com/" rel="nofollow">http://example.com/</a></a><a href="http://example.com/">http://example.com/</a>
Note: See TracTickets for help on using tickets.