WordPress.org

Make WordPress Core

Opened 5 years ago

Closed 8 months ago

Last modified 3 months ago

#14759 closed enhancement (fixed)

Improve the way oEmbed deals with caching

Reported by: Viper007Bond Owned by: helen
Milestone: 4.0 Priority: normal
Severity: normal Version: 3.0.1
Component: Embeds Keywords: has-patch commit needs-docs
Focuses: Cc:

Description (last modified by Viper007Bond)

As Matt pointed to me today, caching oEmbed results to post meta is kinda lame.

I originally did this because I didn't want the HTML in old posts changing on it's own, say for example if the oEmbed provider got compromised. However I think that's extremely unlikely since we only whitelist major websites that can be trusted.

Perhaps instead we should use transients to cache this data. That way when embed HTML is updated, posts will eventually be updated too due to expiring caches. I'm thinking a cache time of a week would be good, but with some random +/- to make sure multiple caches don't all expire at once.

A good example of this is Vimeo. Anyone who embedded a Vimeo embed in their post before a few weeks ago got <object>-based embeds. However now Vimeo gives out <iframe> based ones. The only way currently to update the old embeds to the new HTML (without using a few lines of code) is to edit all of the posts and re-save them to trigger a cache update. That's painful.

Alternate suggestions welcome though.

Attachments (4)

14759.patch (4.0 KB) - added by leewillis77 2 years ago.
Initial patch for discusssion
14759_2.patch (6.3 KB) - added by leewillis77 2 years ago.
Revised patch for discussion
14759.diff (2.3 KB) - added by markjaquith 12 months ago.
Attempt at slightly better caching behavior
14759.2.diff (2.9 KB) - added by markjaquith 8 months ago.

Download all attachments as: .zip

Change History (59)

comment:1 @Viper007Bond5 years ago

  • Description modified (diff)

comment:2 @Viper007Bond5 years ago

  • Description modified (diff)

comment:3 @nacin5 years ago

I like the idea of these expiring, but my first take is that this storage solution has its own drawbacks. Storing them with postmeta is flexible and is tied to the post, versus having an ever-growing number of transients, which often will be stored in the options table.

From a more practical perspective, that also means new queries for the transient, as they're neither autoloaded like options or queried as part of the post cache like postmeta.

But again, I do like the idea of these expiring and being updated (assuming of course proper oEmbed is returned on background refresh).

comment:4 follow-up: @filosofo5 years ago

Why is it lame? For the vast majority of people, the content of a post (and hence its video markup) won't change ever again from a short time after the post is published. It doesn't make much sense to keep refreshing the data as each successive transient expires for something (the post) that remains static over that time.

Actually I think it would be neat to expand that idea and offer a transient-like cache and API specific to each post, so that you can expire the cache only when a post object itself changes.

comment:5 @scribu5 years ago

  • Cc scribu added

comment:6 in reply to: ↑ description ; follow-up: @Denis-de-Bernardy5 years ago

Replying to Viper007Bond:

A good example of this is Vimeo. Anyone who embedded a Vimeo embed in their post before a few weeks ago got <object>-based embeds. However now Vimeo gives out <iframe> based ones. The only way currently to update the old embeds to the new HTML (without using a few lines of code) is to edit all of the posts and re-save them to trigger a cache update. That's painful.

Alternatively, WP could flush postmeta by key when we know a few that are invalid during upgrades. Btw, is it that much of a big deal? I haven't tested, but I would assume that Vimeo made sure the old oEmbed code would continue to work.

If you need postmeta to expire, the easy approach is to an expire date tied to it in a separate postmeta (much like we do for transients in the options table: _$key_expires). Ideally, the stuff key in question gets detected and its parent key gets stored accordingly in memcache.

I'm in agreement with filosofo though: it makes little sense to keep refreshing the data.

comment:7 in reply to: ↑ 6 ; follow-up: @Viper007Bond5 years ago

The other issue is when embeds are used outside of posts, in the sidebar for example. Where do we cache to? Currently it caches sorta randomly (whatever $post happens to be set to at the time).

Replying to Denis-de-Bernardy:

Alternatively, WP could flush postmeta by key when we know a few that are invalid during upgrades. Btw, is it that much of a big deal? I haven't tested, but I would assume that Vimeo made sure the old oEmbed code would continue to work.

We can't flush by service. The meta key is _oembed_md5-hash-of-url-and-args. On post save, _oembed_* are deleted for a post.

Here's some code though that deletes all oEmbed caches: http://core.trac.wordpress.org/attachment/ticket/10337/10337.10.patch

If you need postmeta to expire, the easy approach is to an expire date tied to it in a separate postmeta (much like we do for transients in the options table: _$key_expires). Ideally, the stuff key in question gets detected and its parent key gets stored accordingly in memcache.

That'd require a heavy rewrite of post meta.

comment:8 in reply to: ↑ 7 @Denis-de-Bernardy5 years ago

Replying to Viper007Bond:

The other issue is when embeds are used outside of posts, in the sidebar for example. Where do we cache to? Currently it caches sorta randomly (whatever $post happens to be set to at the time).

Don't we have an in_the_loop() function for that? In a sidebar, it would make a lot of sense.

Replying to Denis-de-Bernardy:

Alternatively, WP could flush postmeta by key when we know a few that are invalid during upgrades. Btw, is it that much of a big deal? I haven't tested, but I would assume that Vimeo made sure the old oEmbed code would continue to work.

We can't flush by service. The meta key is _oembed_md5-hash-of-url-and-args. On post save, _oembed_* are deleted for a post.

If memory serves, the ticket I had looked into related to implementing delete_post_meta_by_key() had code to allow the use of a like statement.

Here's some code though that deletes all oEmbed caches: http://core.trac.wordpress.org/attachment/ticket/10337/10337.10.patch

If I may suggest, the huge IN statement could have been replaced by a join or a subquery. It would make the query much faster on larger sites.

If you need postmeta to expire, the easy approach is to an expire date tied to it in a separate postmeta (much like we do for transients in the options table: _$key_expires). Ideally, the stuff key in question gets detected and its parent key gets stored accordingly in memcache.

That'd require a heavy rewrite of post meta.

For the store in memcached part, yes. For the rest, it seems like a one-liner in get_meta() that checks for a separate key before returning the value. Or a similar one liner in the oEmbed code before using the cached value, no?

comment:9 @Denis-de-Bernardy5 years ago

Btw, I run into this kind of situation with a number of plugins of mine that cache their output when it's complex to generate. I use transients when out of the loop, and postmeta within it. The hard part is flushing, not storing or retrieving.

In our case we've a reasonably simple approach that works: All oEmbed caches could be flushed on WP upgrades. Seems frequent enough IMO. We've a recent ticket about disabling security filters on the front end to make things faster; surely we don't want to replace them with extra API calls to external services. ;-)

comment:10 @nacin4 years ago

  • Milestone changed from Awaiting Review to Future Release

comment:11 in reply to: ↑ 4 @r-a-y4 years ago

  • Cc r-a-y@… added

comment:12 @Mamaduka2 years ago

  • Cc georgemamadashvili@… added

comment:13 @helen2 years ago

Some related discussion on #17210

comment:14 @leewillis772 years ago

#23763 was marked as a duplicate.

comment:15 @leewillis772 years ago

I posted a patch to #23763 with the main aim of supporting cache_age in oEmbed responses, but it seems there's a lot of crossover with this bug so I've closed that in favour of this. Details and patch copied below:

Currently oEmbed results are cached permanently in the postmeta table. These cached results never expire, and will be used forever, or until the WordPress post is updated. This is reasonable behaviour for resources that do not specify a cache_age - although I'd argue we should re-check the oEmbed source occassionally to cope with the case where the original resource is updated / removed etc.

The oEmbed documentation (​http://oembed.com/ - section 2.3.4) lists cache_age as optional - so providers may not provide one. If it is provided, consumers still aren't bound to honour it ("Consumers may choose to use this value or not.") - so the current behaviour is perfectly in line with the spec, but I believe we can do better.

Several of the currently accepted oEmbed providers provide cache_ages (E.g. flickr, hulu, and twitter). Allowing cache_ages also helps when embedding information that can change over time (See ​http://wordpress.org/extend/plugins/github-embed/ and ​http://wordpress.org/extend/plugins/wporg-plugin-embed/ for example).

Patch to follow which:

  • Moves the caching of oEmbed responses from class-wp-embed.php down into class-oembed.php where it has access to the oEmbed response
  • Caches oEmbed HTML in transients rather than postmeta (The same embedded resource will be used no matter which post it is embedded in)
  • Sets a default cache period of 12 hours for oEmbed failures
  • Sets a default cache period of 30 days if no cache_age is specified
  • Sets a cache period according to cache_age if it is specified
  • Adds a new third argument to WP_oEmbed->get_html() and wp_oembed_get() to pass the value of $usecache through from classs-wp-embed.php

More than happy to work this patch up as needed, I'm aware there's still some stuff to tidy up with respect to clearing out the functions that currently drop the postmeta on save/update etc. I'd like to get agreement on the broad approach first though to save duplication of effort.

@leewillis772 years ago

Initial patch for discusssion

comment:16 @leewillis772 years ago

  • Cc junk@… added

comment:17 @DrewAPicture2 years ago

  • Cc xoodrew@… added

comment:18 @Viper007Bond2 years ago

Transients unfortunately probably aren't a viable solution. The problem is that if you have a persistent object cache enabled, transients are never written to the database, only to memcached or whatever system you have set up. These caches come and go and will almost certainly not last 30 days (likely much, much less). This would result in more frequent cache fetching. This is why I chose post meta in the first place -- it is persistent.

There's also the issue of race conditions. If a cache expires and your site is very popular, you will easily have hundreds of users triggering the cache repopulation before it is completed. This is why caching is done preemptively right now via an AJAX request (after you come back to the edit page after saving). It's only done on-demand via the frontend if that AJAX request somehow fails (or potentially in a race condition).

Asynchronous cache repopulation is probably the solution here, i.e. WP-Cron. It could loop through the post meta entries and regenerate any that have expired.

Post meta isn't a great solution but it's the best one I've come up with. :/

Last edited 2 years ago by Viper007Bond (previous) (diff)

comment:19 @leewillis772 years ago

Thanks for the explanation. With that in mind I think I'm going to re-open my original ticket, and I'll post a revised patch that concentrates solely on respecting cache_age in oEmbed responses, taking into account the concerns here about cache re-population - would that be sensible?

comment:20 @helen2 years ago

I think it could all be dealt with together on this ticket. I've been thinking about the flip side - whether we could indicate that a cached oEmbed response is extremely new and doesn't need to be dumped and re-requested (somebody is obsessively fixing small things and updates 10+ times after publishing a post, which kills the Twitter rate limit pretty quickly, and implementation of things like #15490). I guess something indicating the age of a cached response? Not sure.

comment:21 @leewillis772 years ago

The attached patch:

  • Respects cache_age if it is provided by the oEmbed provider
  • Caches for 30 days if not (Should we make this filterable?)
  • Caches failures for 12 hours
  • Stores the oEmbed response in postmeta (_oembed_md5)
  • Stores the expiry time for that embed in postmeta also (_oemexp_md5)
  • Deletes oEmbed data on post update (As per current code)

The missing bit for me currently is how do we preload the oEmbed resource again once it expires (Or is near to expiry). I'm happy to add a cron job that retrieves any postmeta entries that have expired/are due to expire in the next hour(?) and preloads them if we agree that's sensible?

@leewillis772 years ago

Revised patch for discussion

comment:22 @SergeyBiryukov2 years ago

Please use tabs rather than spaces for indentation (per WordPress Coding Standards).

comment:23 @jtsternberg18 months ago

Using postmeta for caching limits the ability to use oEmbed in places like options panels, user profile pages, and other places where there is no access to a global $post. I'm currently trying to figure out a way to bypass the post ID requirement in order to enable oEmbeds on profile edit page (for a client).

comment:24 @jtsternberg18 months ago

  • Cc justin@… added

comment:25 follow-up: @sanchothefat15 months ago

  • Cc robert@… added

I recently managed to brick a server by incorrectly assuming that caching would occur in the WP_oEmbed class. Confessional aside I make use of wp_oembed_get() fairly often and have had to implement my own transient based caches similar to Lee's in various plugins.

So the issue is permanence (hence using post meta instead of transients) and using regular options is not possible because it would use up too much memory when auto loaded.

Can I get some opinions on whether it's a horrible idea to think of oembed strings as media that can be stored as text files in the uploads folder?

Here's my thinking:

  • WP_oEmbed checks cache using file_exists (php caches the result in stat cache)
  • Cache hits are returned using $wp_fileSystem->get_contents() or readfile() whichever is quicker
  • Cache misses are requested and written to disk with a name like /uploads/oembeds/%provider%-md5(%url%+%args%).txt
  • cache_age for each provider can be stored as an option
  • A wp-cron task cleans the expired oembed files based on cache_age and filemtime() using glob('%provider%-*')

Is this just a bad idea? It would mean that embeds are available to all sites in a multisite installation. Some data on the number of files this could produce might show it to be impractical unless date based folders could be used.

EDIT: for one of our bigger sites we're talking about 6000 embeds across 30 sites in a network so maybe it'd be better done per site.

Let me know your thoughts on whether this worth coding up or if I should crawl back under my rock :)

Last edited 15 months ago by sanchothefat (previous) (diff)

comment:26 in reply to: ↑ 25 ; follow-ups: @rmccue15 months ago

Replying to sanchothefat:

I recently managed to brick a server by incorrectly assuming that caching would occur in the WP_oEmbed class. Confessional aside I make use of wp_oembed_get() fairly often and have had to implement my own transient based caches similar to Lee's in various plugins.

So the issue is permanence (hence using post meta instead of transients) and using regular options is not possible because it would use up too much memory when auto loaded.

add_option() takes an autoload parameter that can be set to 'no' instead, plus this stores it per-site rather than network-wide. Seems better than files to me. :)

comment:27 in reply to: ↑ 26 ; follow-up: @Denis-de-Bernardy15 months ago

Replying to rmccue:

add_option() takes an autoload parameter that can be set to 'no' instead, plus this stores it per-site rather than network-wide. Seems better than files to me. :)

But then you're adding extra queries. And storing this kind of stuff in wp_options isn't so great an idea — recall Magpie.

Btw, what's wrong with the idea I suggested earlier, i.e. on WP upgrade, delete all cached oEmbeds? Not by service: all of them. No ifs, no buts. Problem solved if we do that, no?

comment:28 in reply to: ↑ 26 @sanchothefat15 months ago

Replying to rmccue:

add_option() takes an autoload parameter that can be set to 'no' instead, plus this stores it per-site rather than network-wide. Seems better than files to me. :)

I'd thought about that but took into account nacin's comment:

I like the idea of these expiring, but my first take is that this storage solution has its own drawbacks. Storing them with postmeta is flexible and is tied to the post, versus having an ever-growing number of transients, which often will be stored in the options table.
From a more practical perspective, that also means new queries for the transient, as they're neither autoloaded like options or queried as part of the post cache like postmeta.

So the goal is to keep queries down as well.

comment:29 in reply to: ↑ 27 @sanchothefat15 months ago

Replying to Denis-de-Bernardy:

Replying to rmccue:

add_option() takes an autoload parameter that can be set to 'no' instead, plus this stores it per-site rather than network-wide. Seems better than files to me. :)

But then you're adding extra queries. And storing this kind of stuff in wp_options isn't so great an idea — recall Magpie.

Btw, what's wrong with the idea I suggested earlier, i.e. on WP upgrade, delete all cached oEmbeds? Not by service: all of them. No ifs, no buts. Problem solved if we do that, no?

How would you go about caching within WP_oEmbed? Same as in Lee's patch and passing in $usecache as false in the context of WP_Embed?

comment:30 follow-up: @peterbooker15 months ago

Would it be unreasonable to store an expiry time inside the post_meta, and use that to judge expiry instead of deleting all the oembed meta each update?

This would maintain the advantages of storage in post_meta and gain more control over when to refresh the stored content?

comment:31 in reply to: ↑ 30 @sanchothefat15 months ago

Replying to peterbooker:

Would it be unreasonable to store an expiry time inside the post_meta, and use that to judge expiry instead of deleting all the oembed meta each update?

This would maintain the advantages of storage in post_meta and gain more control over when to refresh the stored content?

I think the issue is that if you call WP_oEmbed directly using wp_oembed_get() there is no caching - it can be used outside the context of posts so should ideally be cached at a lower level than WP_Embed which is used on post save.

comment:32 follow-up: @Viper007Bond15 months ago

Correct, the ideal solution would be a caching system not related to posts but rather the call itself.

Caching to the file system will not work or scale. Many filesystems are not local and reading and writing to filesystems heavily will be slow and not good. Look back at what happened when WordPress tried to store WP_Cache results to the filesystem back around WordPress 2.0 if you don't believe me.

Sadly there isn't really a good solution here, at least one that I've thought of. Post meta was the best I could come up with.

comment:33 in reply to: ↑ 32 ; follow-up: @Denis-de-Bernardy15 months ago

Replying to Viper007Bond:

Correct, the ideal solution would be a caching system not related to posts but rather the call itself.

Caching to the file system will not work or scale. Many filesystems are not local and reading and writing to filesystems heavily will be slow and not good. Look back at what happened when WordPress tried to store WP_Cache results to the filesystem back around WordPress 2.0 if you don't believe me.

Sadly there isn't really a good solution here, at least one that I've thought of. Post meta was the best I could come up with.

Methinks it'll remain the best option until we ever mimic what we did with options by creating some kind of post_transient API (and while we're at it, user_transient, comment_transient, etc.).

comment:34 in reply to: ↑ 33 ; follow-ups: @sanchothefat15 months ago

Replying to Denis-de-Bernardy:

Replying to Viper007Bond:

Correct, the ideal solution would be a caching system not related to posts but rather the call itself.

Caching to the file system will not work or scale. Many filesystems are not local and reading and writing to filesystems heavily will be slow and not good. Look back at what happened when WordPress tried to store WP_Cache results to the filesystem back around WordPress 2.0 if you don't believe me.

I believe you :)

Sadly there isn't really a good solution here, at least one that I've thought of. Post meta was the best I could come up with.

Methinks it'll remain the best option until we ever mimic what we did with options by creating some kind of post_transient API (and while we're at it, user_transient, comment_transient, etc.).

Rather than leaving it with no optimal solution would it not at least make sense to use transients in so far as it's better than no caching at all?

The only other thing I can think of is that you could register a hidden post type, create a single post in it and use that as the fallback post ID to cache anything coming from a direct wp_oembed_get() call. It seems completely crazy but might be the least disruptive.

Last edited 15 months ago by sanchothefat (previous) (diff)

comment:35 in reply to: ↑ 34 ; follow-up: @Denis-de-Bernardy15 months ago

Replying to sanchothefat:

Rather than leaving it with no optimal solution would it not at least make sense to use transients in so far as it's better than no caching at all?

The only other thing I can think of is that you could register a hidden post type, create a single post in it and use that as the fallback post ID to cache anything coming from a direct wp_oembed_get() call. It seems completely crazy but might be the least disruptive.

The current solution is the most sane we have at the moment imo. The only issue is expiring it periodically.

The benefit of post_meta is that it adds no extra queries: post_meta gets queried automatically by the API when a post is loaded by WP (because it's in the loop or because you're fetching any of its meta key/value pairs). This makes it a very sane place to store anything that relates to a specific post.

In contrast, a transient in wp_options yields an extra DB hit for each oembed on a page — whether it's cached or not — on sites with no persistent in-memory storage. Likewise, a custom post type would allow to keep the cache in sync if two or more posts embed the same resource, but this would be at the cost of extra DB queries to actually fetch them. Tbh, I'm suspicious that either approach would be a win in the end.

comment:36 in reply to: ↑ 34 @leewillis7715 months ago

Replying to sanchothefat:

Rather than leaving it with no optimal solution would it not at least make sense to use transients in so far as it's better than no caching at all?

Not really - transients disappear without notice. Particularly I'm pretty sure that transients get cleared specifically on WP upgrade nowadays - that would leave some epic re-embedding going on which I imagine wouldn't be pretty for sites that embed even reasonable quantities.

The non-guaranteed persistence of transients makes this unworkable.

The only other thing I can think of is that you could register a hidden post type, create a single post in it and use that as the fallback post ID to cache anything coming from a direct wp_oembed_get() call. It seems completely crazy but might be the least disruptive.

I had wondered about child entries in the posts table (In a similar method to "attachments"), but not sure it helps us - there's still no expiry framework built in.

comment:37 in reply to: ↑ 35 ; follow-up: @leewillis7715 months ago

Replying to Denis-de-Bernardy:

The current solution is the most sane we have at the moment imo. The only issue is expiring it periodically.

Indeed - which I thinks brings us back around to http://core.trac.wordpress.org/ticket/14759#comment:21

Is that a sensible avenue to pursue?

comment:38 in reply to: ↑ 37 @Denis-de-Bernardy15 months ago

Replying to leewillis77:

Replying to Denis-de-Bernardy:

The current solution is the most sane we have at the moment imo. The only issue is expiring it periodically.

Indeed - which I thinks brings us back around to http://core.trac.wordpress.org/ticket/14759#comment:21

Is that a sensible avenue to pursue?

I haven't peeked at the patch, but if it does what is described, it seems sensible imo.

Aside: the same functionality could be wrapped in a function called get_post_transient() and, well... that should likely be a separate ticket, but I'm sure you and others see the point.

Last edited 15 months ago by Denis-de-Bernardy (previous) (diff)

comment:39 @ircbot13 months ago

This ticket was mentioned in IRC in #wordpress-ui by helen. View the logs.

@markjaquith12 months ago

Attempt at slightly better caching behavior

comment:40 follow-up: @markjaquith12 months ago

14759.diff is a rough swing.

  • Gets rid of the "delete all oEmbed postmeta caches on post save" concept
  • Triggers regeneration request even for drafts
  • Cache updated only if a valid oEmbed response comes back
  • Doesn't attempt regeneration for draft re-saves if the cache is populated
  • Reduces the stampede possibilities noted in #17210

I still think that the AJAX method is kind of weird. There is still some stampede possibility, for when you update or publish-without-making-a-draft an oembed URL. In that case, both front-end and back-end attempts could be made.

Some thoughts:

  • Is it worth locking the regeneration for a little bit to let the AJAX regeneration to fire?
  • Should we look at an alternative method of firing the regeneration?
  • Should we just do a synchronous oEmbed attempt if updating a published post (just for new URLs)? Would slow things down, but could stop this "front-end requests result in HTTP requests and UPDATE queries" nonsense.
  • Could we recognize oEmbed URLs client-side in the editor and proactively fire AJAX cache requests before the post is even saved with that URL?
  • Should we be doing TLC transients style soft expiration, obeying the oEmbed cache time if it provides one and using a long cache time (a year) otherwise?
  • Should we have a "post transients" API?

comment:41 in reply to: ↑ 40 @helen12 months ago

Replying to markjaquith:

  • Could we recognize oEmbed URLs client-side in the editor and proactively fire AJAX cache requests before the post is even saved with that URL?

I think this would go a long way toward solving the initial publish problem, and since I also think we should prepare ourselves for a view/preview of sorts even if we don't tackle it this release (although we should probably do at least part of #15490 if we land this soon), we might as well at least do the caching part.

  • Should we be doing TLC transients style soft expiration, obeying the oEmbed cache time if it provides one and using a long cache time (a year) otherwise?

I'm not sure how much I really care about oEmbed expiry, especially if we go with only replacing cache values when a valid response is returned and continue with only doing it on post save (or rather, firing the call to do it after post save).

comment:42 @ircbot12 months ago

This ticket was mentioned in IRC in #wordpress-dev by helen. View the logs.

comment:43 @ircbot10 months ago

This ticket was mentioned in IRC in #wordpress-dev by helen. View the logs.

comment:44 @ircbot10 months ago

This ticket was mentioned in IRC in #wordpress-dev by helen. View the logs.

comment:45 @wonderboymusic10 months ago

  • Milestone changed from Future Release to 4.0

comment:46 @ircbot9 months ago

This ticket was mentioned in IRC in #wordpress-dev by helen. View the logs.

comment:47 @nacin9 months ago

14759.diff looks like a good first step, though 'publish' isn't the only terminal status. There's also 'private'. And, 'future' would happen on cron, thus would result in a frontend race.

I have very little issue with making this synchronous, as long as we avoid doing it on every damn save.

comment:48 @helen8 months ago

Possible ideas, none of them really exclude the rest:

  • (Filterable?) array of statuses that synchronously request and cache the embed. Terminal statuses, but without trying to act like we're establishing the concept here.
  • Only synchronously request embeds that don't have a corresponding non-{{unknown}} response saved already. Store the rest somewhere (transient), then fire a request for those async.
  • Store somewhere (transient) the embeds that have been requested since the last save and don't re-request those at all.
  • Maybe we should short-term cache HTTP responses? See above item.

Now that there are TinyMCE and modal previews, fewer embeds need to be requested on save in general, but of course there are text mode and other scenarios.

@markjaquith8 months ago

comment:49 @markjaquith8 months ago

14759.2.diff Institutes 1 day caching of valid results on the admin side. The front end will always serve from the cache, if available, but the backend will use valid cache responses (but not cached bad-responses "{{unknown}}") for one day. It also no longer purges the cache on every save. Good responses are cached regardless of post status. Cached {{unknown}} responses are used on the front end, but re-attempted on the backend. And bad responses never replace a good response.

This should greatly reduce the outbound requests. Say you had a post with 10 tweets, and saved 10 drafts of it. Previously that would be 100 oEmbed requests to Twitter.com. Now it's 10.

What this doesn't address: that if you never update the post, the cache stays indefinitely.

comment:50 @nacin8 months ago

  • Keywords has-patch commit added

I've been studying 14759.2.diff and I think it's good to go. I have no issues with it in my review and testing.

comment:51 @ircbot8 months ago

This ticket was mentioned in IRC in #wordpress-dev by helen. View the logs.

comment:52 @ircbot8 months ago

This ticket was mentioned in IRC in #wordpress-dev by helen. View the logs.

comment:53 follow-up: @helen8 months ago

  • Owner set to helen
  • Resolution set to fixed
  • Status changed from new to closed

In 28972:

Improve oEmbed caching. Introduces the concept of a TTL for oEmbed caches and a filter for oembed_ttl.

We will no longer replace previously valid oEmbed responses with an {{unknown}} cache value. When this happens due to reaching a rate limit or a service going down, it is data loss, and is not acceptable. This means that oEmbed caches for a post are no longer deleted indiscriminately every time that post is saved.

oEmbed continues to be cached in post meta, with the addition of a separate meta key containing the timestamp of the last retrieval, which is used to avoid re-requesting a recently cached oEmbed response. By default, we consider a valued cached in the past day to be fresh. This can greatly reduce the number of outbound requests, especially in cases where a post containing multiple embeds is saved frequently.

The TTL used to determine whether or not to request a response can be filtered using oembed_ttl, thus allowing for the possibility of respecting the optional oEmbed response parameter cache_age or altering the period of time a cached value is considered to be fresh.

Now that oEmbeds are previewed in the visual editor as well as the media modal, oEmbed caches are often populated before a post is saved or published. By pre-populating and avoiding having to re-request that response, we also greatly reduce the chances of a stampede happening when a published post is visible before oEmbed caching is complete.

As it previously stood, a stampede was extremely likely to happen, as the AJAX caching was only triggered when $_GET['message'] was 1. The published message is 6. We now trigger the caching every time $_GET['message'] is present on the edit screen, as we are able to avoid triggering so many HTTP requests overall.

props markjaquith. fixes #14759. see #17210.

comment:54 @DrewAPicture8 months ago

In 29160:

Inline documentation cleanup for 4.0 audit.

Document the first parameter, $time, in the 'oembed_ttl' filter, added in [28972].

See #14759 and #28885.

comment:55 in reply to: ↑ 53 @kevinweber3 months ago

  • Keywords needs-docs added

Hi Helen,

you annotated that the delete_oembed_caches() function is "Unused by core as of 4.0.0."

I tried to implement an alternative method to clear caches/ttl for one of my plugins (Lazy Load for Videos) – but nothing seems to work. I tried it for several hours :-/
What's the best method to clear all oembed caches?

For now, I use the following code:

	function delete_oembed_caches() {
		global $wp_embed;
		$lazyload_videos_general = new Lazyload_Videos_General();

	    $post_ids = get_posts(
	    	array(
	    		'post_type' => $lazyload_videos_general->get_post_types(),
	    		'posts_per_page' => -1,	// -1 == no limit
	    		'fields' => 'ids',	// Just retrieve a list of IDs (http://thomasgriffinmedia.com/blog/2012/10/optimize-wordpress-queries/)
	    		) );

	    foreach ( $post_ids as $post_id ):
	    	$wp_embed->delete_oembed_caches( $post_id );
	    endforeach;
	}

Would be amazing if you could give me at least a hint.

Best regards,
Kevin

Replying to helen:

In 28972:

Improve oEmbed caching. Introduces the concept of a TTL for oEmbed caches and a filter for oembed_ttl.

We will no longer replace previously valid oEmbed responses with an {{unknown}} cache value. When this happens due to reaching a rate limit or a service going down, it is data loss, and is not acceptable. This means that oEmbed caches for a post are no longer deleted indiscriminately every time that post is saved.

oEmbed continues to be cached in post meta, with the addition of a separate meta key containing the timestamp of the last retrieval, which is used to avoid re-requesting a recently cached oEmbed response. By default, we consider a valued cached in the past day to be fresh. This can greatly reduce the number of outbound requests, especially in cases where a post containing multiple embeds is saved frequently.

The TTL used to determine whether or not to request a response can be filtered using oembed_ttl, thus allowing for the possibility of respecting the optional oEmbed response parameter cache_age or altering the period of time a cached value is considered to be fresh.

Now that oEmbeds are previewed in the visual editor as well as the media modal, oEmbed caches are often populated before a post is saved or published. By pre-populating and avoiding having to re-request that response, we also greatly reduce the chances of a stampede happening when a published post is visible before oEmbed caching is complete.

As it previously stood, a stampede was extremely likely to happen, as the AJAX caching was only triggered when $_GET['message'] was 1. The published message is 6. We now trigger the caching every time $_GET['message'] is present on the edit screen, as we are able to avoid triggering so many HTTP requests overall.

props markjaquith. fixes #14759. see #17210.

Note: See TracTickets for help on using tickets.