Make WordPress Core

Opened 2 years ago

Closed 2 years ago

#57190 closed defect (bug) (invalid)

public-api OEmbed effectively double-encodes entities in title

Reported by: stiiin's profile stiiin Owned by:
Milestone: Priority: normal
Severity: minor Version:
Component: Embeds Keywords: dev-feedback
Focuses: Cc:

Description

I just came across this post on Mastodon: https://infosec.exchange/@ellent@mastodon.nl/109395089902785785 . The post contains a link, for which Mastodon generated a link preview. As you may note, the title of the link preview contains two double-encoded entities: ’ and  .

I believe Mastodon ultimately consulted the public-api.wordpress.com endpoint to generate OEmbed for the linked article. The OEmbed data contains the following title element:

<title><![CDATA[Van wie is die website? Wat bv&#8217;s moeten vermelden op hun&nbsp;website]]></title>

Per the definition of CDATA in the XML syntax: "Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets and ampersands may occur in their literal form."

As such, I believe that the public-api should've either broken those character entities out of CDATA section, or decode the character entities into their corresponding Unicode codepoints.

Aside from security considerations for the public-api itself (for example, XML injection), note that either approach may negatively effect the security of applications that depend on the output of the public-api. The former approach may trip up broken (applications of) XML parsers and negatively affect availability, and that the latter may expose CWE-174 (Double Decoding of the Same Data) vulnerabilities in applications to exploitation.

Change History (4)

#1 @stiiin
2 years ago

Should've taken a bit more time to think this report through;

  • The JSON response for the same OEmbed request also contains the XML/HTML character entities.
  • Section 2.3.4 of the OEmbed spec defines the 'title' parameter as "A text title, describing the resource." It's not spelled out very explicitly, but I believe this definition should be interpreted as "the value contains plain text data" rather than HTML. Another hint in this direction is that, whenever HTML is expected, it's explicitly mentioned as such.

This ticket was mentioned in Slack in #core-test by ironprogrammer. View the logs.


2 years ago

#3 @ironprogrammer
2 years ago

  • Keywords dev-feedback added

Welcome to Trac, and thank you for the report, @stiiin!

I've added dev-feedback to draw core developer attention to this.

#4 @dd32
2 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to invalid
  • Status changed from new to closed

Hi @stiiin,

The oEmbed you've included, https://public-api.wordpress.com/oembed/ is not a WordPress.org provided endpoint, rather, it's a custom endpoint provided by WordPress.com.

I've validated this against a WordPress.org install, which correctly returns:

<title>Van wie is die website? Wat bv’s moeten vermelden op hun website</title>.

This should be reported directly to WordPress.com, that can be done through https://wordpress.com/support/

I'm marking this ticket as invalid as I can't duplicate it against a WordPress installation, but can see it happening on WordPress.com.

Note: See TracTickets for help on using tickets.