Make WordPress Core

Opened 10 years ago

Last modified 5 years ago

#31695 new defect (bug)

Enclosing oEmbed using `embed_oembed_html` is generating invalid HTML

Reported by: vaurdan's profile vaurdan Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: Formatting Keywords: needs-unit-tests wpautop needs-patch
Focuses: Cc:

Description

This bug was detected when trying to preview a post that had a Twitter oembed URL. There was no error on the log, and the previewed content was empty ( the_content() was printing an empty string ).

After some debugging, I found that something was adding an extra <p> and never closing that tag, on the oembed returned markup, right before <script>:

<blockquote class="twitter-tweet" width="550">
<p>Lisbon with #WordPressVIP @ Terreiro do Paço http://instagram.com/p/vGw5IchocB/ </p>
<p>&mdash; Henrique Mouta (@vaurdan) <a href="https://twitter.com/vaurdan/status/530762912462557185">November 7, 2014</a></blockquote>
<p><script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script></div>

I did some tracing and found out that it was wpautop fault. Even though WP_oEmbed::_strip_newlines() was successfully stripping the new lines from the returned oEmbed HTML, something was adding an extra newline before the <script> tag.

$allblocks = '(?:table|thead|tfoot|caption|col|colgroup|tbody|tr|td|th|div|dl|dd|dt|ul|ol|li|pre|form|map|area|blockquote|address|math|style|p|h[1-6]|hr|fieldset|legend|section|article|aside|hgroup|header|footer|nav|figure|details|menu|summary)';
$pee = preg_replace('!(<' . $allblocks . '[^>]*>)!', "\n$1", $pee);
$pee = preg_replace('!(</' . $allblocks . '>)!', "$1\n\n", $pee);

And that was happening because <blockquote> is on the $allblocks, making a newline between blockquote and script.

To fix this, I used the same logic that wpautop uses for <object>, <option>, <source>, and <track>: have no <p> or <br/> around the <script> tag:

if ( strpos( $pee, '<script' ) !== false ) {
	// no P/BR around script
	$pee = preg_replace( '|\s*<script|', '<script', $pee );
	$pee = preg_replace( '|</script>\s*|', '</script>', $pee );
}

Now the oembed is fully functional and doesn't have that unclosed <p> tag:

<blockquote class="twitter-tweet" width="550">
<p>Lisbon with #WordPressVIP @ Terreiro do Paço http://instagram.com/p/vGw5IchocB/ </p>
<p>&mdash; Henrique Mouta (@vaurdan) <a href="https://twitter.com/vaurdan/status/530762912462557185">November 7, 2014</a></blockquote><script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script></div>

Attachments (2)

patch.diff (638 bytes) - added by vaurdan 10 years ago.
Diff file with the patch
patch-truck.diff (649 bytes) - added by vaurdan 10 years ago.

Download all attachments as: .zip

Change History (21)

@vaurdan
10 years ago

Diff file with the patch

#1 @jorbin
10 years ago

  • Keywords needs-unit-tests added

Thanks for the report and initial patch @vaurdan.

Was this something you found in the current trunk version or in 4.2 beta1?

This Is something that we will need unit tests for. wpautop can be a bit of a beast, so making sure that it has automated tests needs to be a high priority.

#2 @vaurdan
10 years ago

Hey @jorbin,

Sorry, that diff related to the WordPress.com codebase, I'm attaching the trunk patch right now.

Regarding the unit testing, I didn't wrote any new testing, but running phpunit passes all the tests:

OK, but incomplete or skipped tests!
Tests: 3266, Assertions: 12636, Skipped: 34.

Should I write an unit test for this?

Last edited 10 years ago by vaurdan (previous) (diff)

@vaurdan
10 years ago

#3 @vaurdan
10 years ago

Actually, my patch isn't fixing the issue ( props to @paulschreiber for the heads up ). There is still an unclosed <p> tag.

I'm trying to fix this by adding a regex that will search for a <p> that doesn't have a correspondent </p> tag, and strip the tag.

Do you think this could be the best way?

#4 @DrewAPicture
9 years ago

  • Version trunk deleted

#5 @miqrogroove
9 years ago

  • Keywords wpautop added

#6 @vaurdan
9 years ago

  • Keywords needs-patch added

I've been trying to work on a fix for this during the last month, but I couldn't find a way to do it. Can someone try to make a patch for it?

#7 @vaurdan
9 years ago

After some testing, we're able to discover what was making the closing </p> to disappear. We've found that if the user has Jetpack, and the theme has support for jetpack-responsive-videos, like twentyfourteen have, the closing </p> tag would disappear from the Twitter oEmbed HTML, making it invalid (as reported above).

This is happening here: https://github.com/Automattic/jetpack/blob/a5d57deb765944f207edc6c1c2482a43ede21101/modules/theme-tools/responsive-videos.php#L43

We can isolate only the "bad" code, and we will have this filter:

function break_embed( $html ) {
    return '<div> ' . $html . '</div>';
}
add_filter( 'embed_oembed_html', 'break_embed' );

If you test with that filter, you will see that the </p> tag will disappear, making the HTML invalid.

Last edited 9 years ago by vaurdan (previous) (diff)

This ticket was mentioned in Slack in #core by henriquemouta. View the logs.


9 years ago

#9 @vaurdan
9 years ago

I was able to reproduce the bug in the last trunk version, in a clean install with twentyfifteeen and no plugins. Here are the steps:

  1. Create a new post with a Twitter link.
  2. Save the post
  3. Check the HTML that was generated for the tweet. It will contain a valid HTML:
    (...)
    <p><script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script></p>
    
  4. Edit tentyfifteen/functions.php and add the following code:
    function break_embed( $html ) {
        return '<div> ' . $html . '</div>';
    }
    add_filter( 'embed_oembed_html', 'break_embed' );
    
  5. Check the HTML again, you will see that it is now missing the final </p>:
    (...)
    <p><script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
    

This ticket was mentioned in Slack in #core by henriquemouta. View the logs.


9 years ago

#11 follow-up: @miqrogroove
9 years ago

I'm a little confused. You had to hack a theme file to reproduce a bug?

#12 in reply to: ↑ 11 @vaurdan
9 years ago

Replying to miqrogroove:

I'm a little confused. You had to hack a theme file to reproduce a bug?

It's not a hack, it's just an example. That's just a piece of code that can be in any plugin - it was actually copy-pasted from JetPack - and that is making the twitter oEmbed HTML invalid. I've added it to the functions.php just to exemplify.

Just because it happens with custom code, it doesn't mean it isn't a bug, right? :)

#13 follow-ups: @miqrogroove
9 years ago

There are of course bugs in wpautop that need to be addressed. I think it's fair to say this is a valid core bug, but the reported symptoms are plugin related. We will prioritize accordingly.

#14 in reply to: ↑ 13 @vaurdan
9 years ago

Replying to miqrogroove:

There are of course bugs in wpautop that need to be addressed. I think it's fair to say this is a valid core bug, but the reported symptoms are plugin related. We will prioritize accordingly.

Oh, yes, of corse! Thanks for working on it.

#15 @vaurdan
9 years ago

Is there any update on this ticket? We've a few more clients from WordPress.com VIP reporting this issue.

#16 @vaurdan
9 years ago

  • Summary changed from Extra unclosed <p> tag before <script> breaking the_content() to Enclosing oEmbed using `embed_oembed_html` is generating invalid HTML

#17 @miqrogroove
9 years ago

We're planning to re-factor wpautop in 4.5. This should get some attention during that chaos :)

#18 in reply to: ↑ 13 @rodrigosprimo
9 years ago

The same problem happens when you add a Facebook video to a post. An unclosed <p> is added before the <script> tag. The following code:

<div id="fb-root"></div><script>(function(d, s, id) {  var js, fjs = d.getElementsByTagName(s)[0];  if (d.getElementById(id)) return;  js = d.createElement(s); js.id = id;  js.src = "//connect.facebook.net/pt_BR/sdk.js#xfbml=1&version=v2.3";  fjs.parentNode.insertBefore(js, fjs);}(document, 'script', 'facebook-jssdk'));</script><div class="fb-video" data-allowfullscreen="1" data-href="/CatracaLivre/videos/vl.344619742400665/907235222646823/?type=3"><div class="fb-xfbml-parse-ignore"></div></div>

Becomes (note the added unclosed <p> before <script> on the second line):

<div id="fb-root"></div>
<p><script>(function(d, s, id) {  var js, fjs = d.getElementsByTagName(s)[0];  if (d.getElementById(id)) return;  js = d.createElement(s); js.id = id;  js.src = "//connect.facebook.net/pt_BR/sdk.js#xfbml=1&#038;version=v2.3";  fjs.parentNode.insertBefore(js, fjs);}(document, 'script', 'facebook-jssdk'));</script>
<div class="fb-video" data-allowfullscreen="1" data-href="/CatracaLivre/videos/vl.344619742400665/907235222646823/?type=3">
<div class="fb-xfbml-parse-ignore"></div>


Contrary to the case @vaurdan reported, this is happening on a clean WordPress install with no plugins enabled running the latest version from master.

#19 @vaurdan
9 years ago

Just another heads-up that this bug is happening in a vanilla WordPress installation, without any plugins installed and with the default theme.

@miqrogroove please, don't forget about this one when refactoring wpautop :)

Note: See TracTickets for help on using tickets.