Opened 3 years ago
Last modified 3 months ago
#14759 new enhancement
Improve the way oEmbed deals with caching
| Reported by: |
|
Owned by: | |
|---|---|---|---|
| Priority: | normal | Milestone: | Future Release |
| Component: | Embeds | Version: | 3.0.1 |
| Severity: | normal | Keywords: | |
| Cc: | scribu, r-a-y@…, georgemamadashvili@…, junk@…, xoodrew@… |
Description (last modified by Viper007Bond)
As Matt pointed to me today, caching oEmbed results to post meta is kinda lame.
I originally did this because I didn't want the HTML in old posts changing on it's own, say for example if the oEmbed provider got compromised. However I think that's extremely unlikely since we only whitelist major websites that can be trusted.
Perhaps instead we should use transients to cache this data. That way when embed HTML is updated, posts will eventually be updated too due to expiring caches. I'm thinking a cache time of a week would be good, but with some random +/- to make sure multiple caches don't all expire at once.
A good example of this is Vimeo. Anyone who embedded a Vimeo embed in their post before a few weeks ago got <object>-based embeds. However now Vimeo gives out <iframe> based ones. The only way currently to update the old embeds to the new HTML (without using a few lines of code) is to edit all of the posts and re-save them to trigger a cache update. That's painful.
Alternate suggestions welcome though.
Attachments (2)
Change History (24)
comment:1
Viper007Bond
— 3 years ago
- Description modified (diff)
comment:2
Viper007Bond
— 3 years ago
- Description modified (diff)
comment:4
follow-up:
↓ 11
filosofo
— 3 years ago
Why is it lame? For the vast majority of people, the content of a post (and hence its video markup) won't change ever again from a short time after the post is published. It doesn't make much sense to keep refreshing the data as each successive transient expires for something (the post) that remains static over that time.
Actually I think it would be neat to expand that idea and offer a transient-like cache and API specific to each post, so that you can expire the cache only when a post object itself changes.
comment:6
in reply to:
↑ description
;
follow-up:
↓ 7
Denis-de-Bernardy
— 3 years ago
Replying to Viper007Bond:
A good example of this is Vimeo. Anyone who embedded a Vimeo embed in their post before a few weeks ago got <object>-based embeds. However now Vimeo gives out <iframe> based ones. The only way currently to update the old embeds to the new HTML (without using a few lines of code) is to edit all of the posts and re-save them to trigger a cache update. That's painful.
Alternatively, WP could flush postmeta by key when we know a few that are invalid during upgrades. Btw, is it that much of a big deal? I haven't tested, but I would assume that Vimeo made sure the old oEmbed code would continue to work.
If you need postmeta to expire, the easy approach is to an expire date tied to it in a separate postmeta (much like we do for transients in the options table: _$key_expires). Ideally, the stuff key in question gets detected and its parent key gets stored accordingly in memcache.
I'm in agreement with filosofo though: it makes little sense to keep refreshing the data.
comment:7
in reply to:
↑ 6
;
follow-up:
↓ 8
Viper007Bond
— 3 years ago
The other issue is when embeds are used outside of posts, in the sidebar for example. Where do we cache to? Currently it caches sorta randomly (whatever $post happens to be set to at the time).
Replying to Denis-de-Bernardy:
Alternatively, WP could flush postmeta by key when we know a few that are invalid during upgrades. Btw, is it that much of a big deal? I haven't tested, but I would assume that Vimeo made sure the old oEmbed code would continue to work.
We can't flush by service. The meta key is _oembed_md5-hash-of-url-and-args. On post save, _oembed_* are deleted for a post.
Here's some code though that deletes all oEmbed caches: http://core.trac.wordpress.org/attachment/ticket/10337/10337.10.patch
If you need postmeta to expire, the easy approach is to an expire date tied to it in a separate postmeta (much like we do for transients in the options table: _$key_expires). Ideally, the stuff key in question gets detected and its parent key gets stored accordingly in memcache.
That'd require a heavy rewrite of post meta.
comment:8
in reply to:
↑ 7
Denis-de-Bernardy
— 3 years ago
Replying to Viper007Bond:
The other issue is when embeds are used outside of posts, in the sidebar for example. Where do we cache to? Currently it caches sorta randomly (whatever $post happens to be set to at the time).
Don't we have an in_the_loop() function for that? In a sidebar, it would make a lot of sense.
Replying to Denis-de-Bernardy:
Alternatively, WP could flush postmeta by key when we know a few that are invalid during upgrades. Btw, is it that much of a big deal? I haven't tested, but I would assume that Vimeo made sure the old oEmbed code would continue to work.
We can't flush by service. The meta key is _oembed_md5-hash-of-url-and-args. On post save, _oembed_* are deleted for a post.
If memory serves, the ticket I had looked into related to implementing delete_post_meta_by_key() had code to allow the use of a like statement.
Here's some code though that deletes all oEmbed caches: http://core.trac.wordpress.org/attachment/ticket/10337/10337.10.patch
If I may suggest, the huge IN statement could have been replaced by a join or a subquery. It would make the query much faster on larger sites.
If you need postmeta to expire, the easy approach is to an expire date tied to it in a separate postmeta (much like we do for transients in the options table: _$key_expires). Ideally, the stuff key in question gets detected and its parent key gets stored accordingly in memcache.
That'd require a heavy rewrite of post meta.
For the store in memcached part, yes. For the rest, it seems like a one-liner in get_meta() that checks for a separate key before returning the value. Or a similar one liner in the oEmbed code before using the cached value, no?
comment:9
Denis-de-Bernardy
— 3 years ago
Btw, I run into this kind of situation with a number of plugins of mine that cache their output when it's complex to generate. I use transients when out of the loop, and postmeta within it. The hard part is flushing, not storing or retrieving.
In our case we've a reasonably simple approach that works: All oEmbed caches could be flushed on WP upgrades. Seems frequent enough IMO. We've a recent ticket about disabling security filters on the front end to make things faster; surely we don't want to replace them with extra API calls to external services. ;-)
comment:10
nacin
— 3 years ago
- Milestone changed from Awaiting Review to Future Release
comment:11
in reply to:
↑ 4
r-a-y
— 2 years ago
- Cc r-a-y@… added
comment:12
Mamaduka
— 9 months ago
- Cc georgemamadashvili@… added
comment:13
helen
— 4 months ago
Some related discussion on #17210
comment:14
leewillis77
— 3 months ago
#23763 was marked as a duplicate.
comment:15
leewillis77
— 3 months ago
I posted a patch to #23763 with the main aim of supporting cache_age in oEmbed responses, but it seems there's a lot of crossover with this bug so I've closed that in favour of this. Details and patch copied below:
Currently oEmbed results are cached permanently in the postmeta table. These cached results never expire, and will be used forever, or until the WordPress post is updated. This is reasonable behaviour for resources that do not specify a cache_age - although I'd argue we should re-check the oEmbed source occassionally to cope with the case where the original resource is updated / removed etc.
The oEmbed documentation (http://oembed.com/ - section 2.3.4) lists cache_age as optional - so providers may not provide one. If it is provided, consumers still aren't bound to honour it ("Consumers may choose to use this value or not.") - so the current behaviour is perfectly in line with the spec, but I believe we can do better.
Several of the currently accepted oEmbed providers provide cache_ages (E.g. flickr, hulu, and twitter). Allowing cache_ages also helps when embedding information that can change over time (See http://wordpress.org/extend/plugins/github-embed/ and http://wordpress.org/extend/plugins/wporg-plugin-embed/ for example).
Patch to follow which:
- Moves the caching of oEmbed responses from class-wp-embed.php down into class-oembed.php where it has access to the oEmbed response
- Caches oEmbed HTML in transients rather than postmeta (The same embedded resource will be used no matter which post it is embedded in)
- Sets a default cache period of 12 hours for oEmbed failures
- Sets a default cache period of 30 days if no cache_age is specified
- Sets a cache period according to cache_age if it is specified
- Adds a new third argument to WP_oEmbed->get_html() and wp_oembed_get() to pass the value of $usecache through from classs-wp-embed.php
More than happy to work this patch up as needed, I'm aware there's still some stuff to tidy up with respect to clearing out the functions that currently drop the postmeta on save/update etc. I'd like to get agreement on the broad approach first though to save duplication of effort.
comment:16
leewillis77
— 3 months ago
- Cc junk@… added
comment:17
DrewAPicture
— 3 months ago
- Cc xoodrew@… added
comment:18
Viper007Bond
— 3 months ago
Transients unfortunately probably aren't a viable solution. The problem is that if you have a persistent object cache enabled, transients are never written to the database, only to memcached or whatever system you have set up. These caches come and go and will almost certainly not last 30 days (likely much, much less). This would result in more frequent cache fetching. This is why I chose post meta in the first place -- it is persistent.
There's also the issue of race conditions. If a cache expires and your site is very popular, you will easily have hundreds of users triggering the cache repopulation before it is completed. This is why caching is done preemptively right now via an AJAX request (after you come back to the edit page after saving). It's only done on-demand via the frontend if that AJAX request somehow fails (or potentially in a race condition).
Asynchronous cache repopulation is probably the solution here, i.e. WP-Cron. It could loop through the post meta entries and regenerate any that have expired.
Post meta isn't a great solution but it's the best one I've come up with. :/
comment:19
leewillis77
— 3 months ago
Thanks for the explanation. With that in mind I think I'm going to re-open my original ticket, and I'll post a revised patch that concentrates solely on respecting cache_age in oEmbed responses, taking into account the concerns here about cache re-population - would that be sensible?
comment:20
helen
— 3 months ago
I think it could all be dealt with together on this ticket. I've been thinking about the flip side - whether we could indicate that a cached oEmbed response is extremely new and doesn't need to be dumped and re-requested (somebody is obsessively fixing small things and updates 10+ times after publishing a post, which kills the Twitter rate limit pretty quickly, and implementation of things like #15490). I guess something indicating the age of a cached response? Not sure.
comment:21
leewillis77
— 3 months ago
The attached patch:
- Respects cache_age if it is provided by the oEmbed provider
- Caches for 30 days if not (Should we make this filterable?)
- Caches failures for 12 hours
- Stores the oEmbed response in postmeta (_oembed_md5)
- Stores the expiry time for that embed in postmeta also (_oemexp_md5)
- Deletes oEmbed data on post update (As per current code)
The missing bit for me currently is how do we preload the oEmbed resource again once it expires (Or is near to expiry). I'm happy to add a cron job that retrieves any postmeta entries that have expired/are due to expire in the next hour(?) and preloads them if we agree that's sensible?
comment:22
SergeyBiryukov
— 3 months ago
Please use tabs rather than spaces for indentation (per WordPress Coding Standards).
I like the idea of these expiring, but my first take is that this storage solution has its own drawbacks. Storing them with postmeta is flexible and is tied to the post, versus having an ever-growing number of transients, which often will be stored in the options table.
From a more practical perspective, that also means new queries for the transient, as they're neither autoloaded like options or queried as part of the post cache like postmeta.
But again, I do like the idea of these expiring and being updated (assuming of course proper oEmbed is returned on background refresh).