Make WordPress Core

Opened 9 years ago

Closed 2 years ago

Last modified 13 months ago

#30580 closed defect (bug) (fixed)

wp_extract_urls not returning full URL with & query separator

Reported by: trex005's profile trex005 Owned by: peterwilsoncc's profile peterwilsoncc
Milestone: 6.0 Priority: normal
Severity: normal Version: 3.7
Component: Formatting Keywords: has-patch has-unit-tests needs-testing commit
Focuses: Cc:

Description

Related : 29314

When formatting a link, and even when inserting a URL in the visual editor in WordPress the & separator used for the query variables will be properly encoded into &. However, when you then pass that string to wp_extract_urls, it incorrectly truncates the URL at the &.

Example :
https://www.youtube.com/watch?v=exvUH2qKLTU&start=25
Becomes
https://www.youtube.com/watch?v=exvUH2qKLTU&

Attachments (2)

30580.diff (1.1 KB) - added by voldemortensen 9 years ago.
30580.2.diff (1.2 KB) - added by voldemortensen 9 years ago.

Download all attachments as: .zip

Change History (14)

#1 @johnbillion
9 years ago

  • Component changed from General to Formatting
  • Keywords needs-patch needs-unit-tests added
  • Version changed from trunk to 3.7

Thanks for the report!

@voldemortensen
9 years ago

#2 @voldemortensen
9 years ago

  • Keywords has-patch added; needs-patch needs-unit-tests removed

Fixes the regex used in wp_extract_urls. Adds unit tests.

#3 @johnbillion
9 years ago

Those tests could probably benefit from some URLs with trailing punctuation, eg. http://example.com;.

#4 @voldemortensen
9 years ago

I was just writing a comment saying that the ; character would now be allowed in urls, which has the potential to allow broken urls. But I think its better to allow some broken urls (we can't help if someone enters a bad link), than to not allow valid urls. Adding patch with suggested unit tests.

Last edited 9 years ago by voldemortensen (previous) (diff)

This ticket was mentioned in PR #2283 on WordPress/wordpress-develop by ironprogrammer.


2 years ago
#5

  • Keywords has-unit-tests added

This is a refresh of Trac #30580 and adds additional test cases.

Trac ticket: https://core.trac.wordpress.org/ticket/30580

#6 @ironprogrammer
2 years ago

Thanks to @voldemortensen for past work on this 😄

I've pushed up PR 2283 which refreshes this patch, and adds some additional test cases.

Tests Added

  • URLs with either named or numbered entities, e.g. & or &.
  • URL with non-escaped ampersand (common for manually written or edited URLs).
  • URLs with typos, such as an extraneous semicolon, or a missing semicolon on an HTML entity.

This fix unblocks unit testing needed for #38197.

#7 @ironprogrammer
2 years ago

  • Keywords needs-testing added

This ticket was mentioned in Slack in #core-test by ironprogrammer. View the logs.


2 years ago

#9 @ironprogrammer
2 years ago

The proposed PR 2283 has been updated to include backward compatibility support for the removal of unused semicolons (;).

Props @costdev, @hellofromTonya

#10 @peterwilsoncc
2 years ago

  • Keywords commit added

I think the linked pull request is good to commit as an improvement.

The issue of semi-colons being stripped remains even though they're perfectly valid characters in a URL. There's still a little to do to make sure that's dealt with correctly so it can be fixed on a follow up ticket.

#11 @peterwilsoncc
2 years ago

  • Owner set to peterwilsoncc
  • Resolution set to fixed
  • Status changed from new to closed

In 53044:

Formatting: Account for HTML entities in wp_extract_urls().

Prevent wp_extract_urls() trimming HTML entities within URLs. Correctly escaped URLs such as https://youtube.com/watch?v=dQw4w9WgXcQ&t=1 will now be extracted as https://youtube.com/watch?v=dQw4w9WgXcQ&t=1 rather than truncated.

Props trex005, voldemortensen, johnbillion, ironprogrammer, costdev, hellofromtonya.
Fixes #30580

#12 @SergeyBiryukov
13 months ago

  • Milestone set to 6.0
Note: See TracTickets for help on using tickets.