Make WordPress Core

Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#31214 closed defect (bug) (fixed)

oEmbed: Newlines within <pre> tags get removed

Reported by: cweiske's profile cweiske Owned by: wonderboymusic's profile wonderboymusic
Milestone: 4.2 Priority: normal
Severity: normal Version: 4.1
Component: Embeds Keywords: has-patch
Focuses: Cc:


An oEmbed's HTML content gets all \n newlines removed through

add_filter( 'oembed_dataparse', array($this, '_strip_newlines'), 10, 3 );

WP_oEmbed::strip_newlines() unfortunately removes every newline, even those within <pre> tags - but they are crucial there. My oEmbedded code pastes all only have a single line currently:

Please fix this method to not remove newlines within <pre> tags.

Attachments (2)

31214.diff (1.2 KB) - added by wonderboymusic 9 years ago.
31214.2.diff (1.7 KB) - added by wonderboymusic 9 years ago.

Download all attachments as: .zip

Change History (15)

#1 @wonderboymusic
9 years ago

  • Keywords needs-patch added
  • Milestone changed from Awaiting Review to Future Release

9 years ago

#2 @wonderboymusic
9 years ago

  • Keywords has-patch added; needs-patch removed
  • Milestone changed from Future Release to 4.2

31214.diff does this if DOMDocument exists, which falls in line with other places that bow out if DOMDocument isn't available (please don't disable this, for crying out loud).

The tokens have to be sequential to account for HTML nodes that may be identical

#3 @wonderboymusic
9 years ago

  • Owner set to wonderboymusic
  • Resolution set to fixed
  • Status changed from new to closed

In 31415:

Protect <pre> tags when parsing oEmbed responses in WP_oEmbed::_strip_newlines() in DOMDocument is available.

Fixes #31214.

#4 @wonderboymusic
9 years ago

In 31416:

After [31415], make sure str_replace() only occurs once for each matched tag to avoid overwriting until <pre>s.

See #31214.

#5 @cweiske
9 years ago

The applied patch does not work when the <pre> tag contains HTML entities like &#160;.

$dom->saveHTML() replaces the entities with their real characters, which in turn leads to a failing following str_replace call.

Try to embed - it is what I have the problem with.

Last edited 9 years ago by cweiske (previous) (diff)

#6 @wonderboymusic
9 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

#7 @cweiske
9 years ago

I tried to use html_entity_decode on the input HTML and the saveHTML()'ed HTML, but it seems that the saved <pre> tag is UTF-8, while the input html is a different encoding.

#8 @wonderboymusic
9 years ago

it's the DOMDocument internals, trying a few different things now...

#9 @wonderboymusic
9 years ago

  1. DOMDocument is awesome and completely sucks at the same time
  2. 31214.2.diff uses RegEx instead, please test.

#11 @wonderboymusic
9 years ago

In 31423:

Use RegEx instead of DOMDocument when protecting <pre> tags in WP_oEmbed::_strip_newlines(). It is incredibly difficult to maintain character encoding and whitespace when parsing via DOMDocument.

See #31214.

#12 @wonderboymusic
9 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

Calling this fixed - I have been filtering oembed responses for 3 weeks now and adding <pre> tags. All works as expected.

This ticket was mentioned in Slack in #feature-respimg by joemcgill. View the logs.

9 years ago

Note: See TracTickets for help on using tickets.