WordPress.org

Make WordPress Core

Opened 3 weeks ago

Last modified 30 hours ago

#41746 new defect (bug)

oEmbed does not respect canonical provider url parameter

Reported by: dougal Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 2.9
Component: Embeds Keywords: has-patch dev-feedback 2nd-opinion needs-unit-tests
Focuses: Cc:

Description

I came across a Twitter URL format that would not embed correctly. Providing that URL to their provider endpoint returned an error. But the original page had a <link> element which already had a working, canonical url parameter in its querystring.

An example URL is:
https://twitter.com/i/web/status/898599373956722688

If you try to fetch oEmbed data for that URL by just adding it as a url querystring parameter on the standard Twitter oEmbed provider URL, it will return an error.

But view source on that page, and you'll see:
<link rel="alternate" type="application/json+oembed" href="https://publish.twitter.com/oembed?url=https://twitter.com/dimensionmedia/status/898599373956722688" title="David Bisset on Twitter: &quot;Agorakit is a web based open source &quot;groupware for citizens initiatives” (which i’ve seen @buddypress used for too) https://t.co/bFPw9ZZWi2 https://t.co/H1REt0QfcO&quot;">

Note that the path of this URL is .../{username}/status/{id}, whereas the original URL was .../i/web/status/{id}.

I've worked out a small patch and method for getting WordPress to use oEmbed discovery to extract and use the canonical URL.

When using wp_oembed_add_provider(), if you leave the provider URL falsey, then WP_oEmbed::get_provider() will use discovery to find it (assuming that you haven't forced discovery = false in $args). Then my patch will pull the url arg from there and use that, instead of the original URL that was passed in to the embed handling.

Later, when the JSON response is being handled, the code will still be able to see whether this is a whitelisted URL pattern, and bypass/perform security filtering such as kses() (see wp_filter_oembed_result()).

Attachments (1)

41746.patch (1.9 KB) - added by dougal 3 weeks ago.
Use canonical provider url param in discovery, if provided

Download all attachments as: .zip

Change History (10)

@dougal
3 weeks ago

Use canonical provider url param in discovery, if provided

#1 @dougal
3 weeks ago

  • Keywords has-patch dev-feedback added

This ticket was mentioned in Slack in #core by dougal. View the logs.


3 weeks ago

This ticket was mentioned in Slack in #core by dougal. View the logs.


3 weeks ago

#4 @azaozz
3 weeks ago

  • Keywords 2nd-opinion added

This ticket was mentioned in Slack in #core by dougal. View the logs.


2 weeks ago

#6 @dougal
11 days ago

Some more info on this...

I found another class of Twitter URLs that display this behavior -- media links in tweets (e.g., photos or videos):
https://www.twitter.com/{username}/status/{statusid}/photo/{num}
https://www.twitter.com/{username}/status/{statusid}/video/{num}

I ran across these when auto-cross-posting bookmarks from | Pinboard to my blog. Pinboard has a feature to automatically bookmark URLs from tweets that you mark as favorites. When it does this, it can extract these types of URLs that point directly to Twitter resources, but which the Twitter oEmbed provider does not recognize. Twitter apparently expects you to get the canonical provider URL from the original resource's head.

I believe that @azaozz said that he had seen other services that extract URLs from tweets similarly.

#7 @dougal
11 days ago

Also, to clarify, the actual bug here is that in cases where we are already using discovery, and the service is already giving us a URL in the discovered provider URL, we override it with whatever original URL we were spidering in the first place. Clear as mud? Let's break it down:

User: Hey WordPress, I need you to oEmbed URLA.
WP: Hmm, I don't already have a provider URL in my whitelist for that, let's try discovery!
WP: Hey, Service -- what's the oEmbed provider URL for URLA? (<spiders URLA>)
Service: The provider address for URLA is {ProvUrl}?url=URLB
WP: Thanks! But I'll just toss out that querystring, and rebuild it...
WP: The provider URL is now {ProvUrl}?url=URLA! I'll fetch some JSON from there...
Service: oEmbed for URLA? No such thing!
User: Dangit, why won't this URL embed correctly?

#8 @swissspidy
31 hours ago

  • Keywords needs-unit-tests added
  • Version changed from trunk to 2.9

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


30 hours ago

Note: See TracTickets for help on using tickets.