WordPress.org

Make WordPress Core

Opened 7 months ago

Last modified 4 weeks ago

#25387 new defect (bug)

Autoembeds don't work with paragraphs

Reported by: Looimaster Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version: 3.6.1
Component: Embeds Keywords: has-patch needs-testing has-unit-tests commit 4.0-early
Focuses: Cc:

Description

I'm writing the below pieces of code in "Text" mode of the posts editor.

Does work:

http://www.youtube.com/watch?v=xxxxxxxxxxx

Does work:

<p>test
http://www.youtube.com/watch?v=xxxxxxxxxxx
test</p>

Doesn't work:

<p>http://www.youtube.com/watch?v=xxxxxxxxxxx</p>

Attachments (3)

25387.diff (585 bytes) - added by seanchayes 3 months ago.
25387-sample-oembed-content.jpg (135.0 KB) - added by seanchayes 3 months ago.
25387.2.diff (5.1 KB) - added by seanchayes 6 weeks ago.

Download all attachments as: .zip

Change History (13)

comment:1 knutsp7 months ago

Also when the visual editor adds something like:

<span style="some:styling;">http://www.youtube.com/watch?v=xxxxxxxxxxx</span>

it doesn't work.

comment:2 redsweater7 months ago

I understand the current matching is very conservative on purpose, as that was the tradeoff when turning on autoembed by default. But for the examples listed above and many others, I think the current pattern could be loosened up a bit to match many, many cases where the desired autoembedding would in fact be preferable to leaving the URL raw.

For the sake of argument here is one proposed alternative regex to the one currently used in class-wp-embed.php's autoembed() function:

(?:(?:^[^\s"']*\s*>?)|(?:\s))(https?://[^\s<]+)

I am not a regex expert, but i cobbled this together. I also ran it against several test cases where it seems to obtain the desired behavior. In each of these example cases, it matches and extracts the desired URL and nothing surrounding it:

<p>
http://www.wordpress.org/ 
</p>
<p>http://www.wordpress.org/</p>
Some text
http://www.wordpress.org/
Some more text
<p> http://www.wordpress.org/</p>
<p>http://www.apple.com/ </p>
http://www.red-sweater.com/
http://www.wordpress.org/

Note that the pattern does NOT match obvious examples I could think of where a literal URL would be expected to be left alone, for example:

<a href="http://www.wordpress.org">WordPress</a>

On the other hand, there are some areas where a false-ID will still be made. For example if the HTML above were slightly malformed and possessed spaces around the URL:

<a href=" http://www.wordpress.org ">WordPress</a>

In the end I think we should weigh the current frustration over the malfunctioning behavior against the possible frustration of falsely converting URLs. It seems to me that so long as the major cases where the URL is clearly intended to be part of markup are covered, there will be very few frustratingly unwanted substitutions. That said, I get that without an option to disable the feature, one frustrating subsitution could be one too many.

In short it feels to me that the current substitution behavior is not "magical" enough to justify its own existence. Requiring an author to meet all the caveats of the auto-embed regex match as it stands today seems about as onerous as requiring them to put the URL into literal [embed] shortcodes. If WordPress is going to support this laudible feature then I think it should be expanded to affect many more obvious substitution scenarios.

comment:3 redsweater7 months ago

  • Cc jalkut@… added

comment:4 nacin3 months ago

  • Component changed from General to Embeds
  • Milestone changed from Awaiting Review to 3.9

Would be nice to fix this. Thanks for the thorough analysis, redsweater.

comment:5 seanchayes3 months ago

This is my first punt at regex and I have attached a patch and a screenshot with the content I tested against.

In essence I started by working from redsweaters regex, found I kept getting an error when I was testing and I just couldn't deduce why. This is what I got:

preg_replace_callback(): Unknown modifier '(' ...

So, I modified the original regex and it's in the patch and here:

|(?<!")(https?://[^\s"\[<]+)|im

Along the way I did fairly basic testing with YouTube, Vimeo, WordPress.tv and Slideshare urls spread about some content (within tags and anchors as well as on separate lines and within embeds) I crafted to identify where a url was successfully parsed and embedded vs left alone.

Long story short, based on my testing, I could resolve the issue reported in this ticket, but I expect more testing is required to cover more scenarios and perhaps someone with a little more experience (and time) with regex might offer guidance on the pattern used.

seanchayes3 months ago

comment:6 seanchayes3 months ago

  • Keywords has-patch needs-testing added

comment:7 SergeyBiryukov2 months ago

  • Keywords needs-unit-tests added

comment:8 seanchayes6 weeks ago

I've added a revised patch 25387.2.patch with a unit test (my first).

Patch is now more comprehensive looking for urls and assessing the patterns of text/markup around the url. By no means can it be considered fully comprehensive just more comprehensive than we have now.

Based on the original bug report and on the feedback from redsweater and knutsp it should move us forward.

Guidance welcome if the unit test is not up to scratch.

seanchayes6 weeks ago

comment:9 nacin4 weeks ago

  • Keywords has-unit-tests added; needs-unit-tests removed

seanchayes, at a glance, the patch and unit tests look good. One thing I'll note is we use tabs rather than spaces for indentation. But that's the only thing I noticed.

comment:10 nacin4 weeks ago

  • Keywords commit 4.0-early added
  • Milestone changed from 3.9 to Future Release

I'm really concerned about the potential for breakage here. I think the tests are great but I'd like to make sure this gets sufficient real-world testing. Let's make this happen in early 4.0.

Note: See TracTickets for help on using tickets.