Opened 4 years ago
Last modified 4 years ago
#50514 new defect (bug)
make_clickable nested links bug
Reported by: | elhardoum | Owned by: | |
---|---|---|---|
Milestone: | Awaiting Review | Priority: | normal |
Severity: | normal | Version: | 5.5 |
Component: | Formatting | Keywords: | has-patch needs-unit-tests |
Focuses: | Cc: |
Description
If you look at the source of [make_clickable
](https://core.trac.wordpress.org/browser/tags/5.4/src/wp-includes/formatting.php#L2985) there's one last regex call to replace potentially nested links:
// Cleanup of accidental links within links.
return preg_replace( '#(<a([ \r\n\t]+[^>]+?>|>))<a [^>]+?>([^>]+?)</a></a>#i', '$1$3</a>', $r );
From the looks of the expression, this is meant to remove accidental nested links only if the parent link wrapping them does not have any non-link text at the edges. Let me provide a few examples:
- This works as intended:
<?php $text = '<a href="https://w.org">https://w.org</a>'; $click = make_clickable($text); # <a href="https://w.org">https://w.org</a>
- Let's introduce more content inside the link, either prepend or append it to the hyperlink's inner URL:
<?php $text = '<a href="https://w.org"> https://w.org</a>'; $click = make_clickable($text); # <a href="https://w.org">https://w.org</a> # <a href="https://w.org"> <a href="https://w.org" rel="nofollow">https://w.org</a></a> $text = '<a href="https://w.org">https://w.org </a>'; $click = make_clickable($text); # <a href="https://w.org">https://w.org</a> # <a href="https://w.org"><a href="https://w.org" rel="nofollow">https://w.org</a> </a>
There, I used a simple whitespace for the sake of an example.
Suggested Patch
I am suggesting a simple fix with the cleanup regex expression, although however if you managed to understand the root problem you'd be able to come up with something much better.
<?php // Cleanup of accidental links within links. return preg_replace( '#(<a([ \r\n\t]+[^>]+?>|>))(.+)?<a [^>]+?>([^>]+?)</a>(.+)?</a>#is', '$1$3$4$5</a>', $r );
In my patch you'll see the following changes:
- Added
(.+)?
for capturing any leading characters before any nested links - Added
(.+)?
for capturing any trailing characters before any nested links - Added
s
modifier so as to have the previous capturers work with newlines '$1$3$4$5</a>'
restoring the captured leading or trailing characters back into the cleaned up HTML.
Also worth noting I am running WordPress 5.4.2, PHP 7.4.4, nginx/1.17.10 on an Alpine Linux docker container (Linux 4.19.76-linuxkit x86_64).
Attachments (1)
Change History (5)
This ticket was mentioned in PR #371 on WordPress/wordpress-develop by elhardoum.
4 years ago
#1
Fixes a make_clickable
nested links potential bug, as the function cleans up any accidental links it fails to catch all accidental links.
More details in the trac ticket.
Trac ticket: https://core.trac.wordpress.org/ticket/50514
#4
@
4 years ago
This would solve a problem that I've encountered in the wild, but it appears that it still has unexpected behavior. For example, with this input:
<a href="http://example.com/">http://example.com/</a><a href="http://example.com/">http://example.com/</a>
the patched make_clickable()
produces output with nested <a>
tags:
<a href="http://example.com/"><a href="http://example.com/" rel="nofollow">http://example.com/</a></a><a href="http://example.com/">http://example.com/</a>
Patch for line https://core.trac.wordpress.org/browser/tags/5.4/src/wp-includes/formatting.php#L2986