Opened 9 years ago
Closed 6 years ago
#36456 closed defect (bug) (wontfix)
oEmbeds containing Emojis aren't cached if the meta_value column's charset is utf8 and not utf8mb4
Reported by: |
|
Owned by: | |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | normal | Version: | 4.4.2 |
Component: | Emoji | Keywords: | |
Focuses: | Cc: |
Description
oEmbeds (e.g. Instagram, Twitter, ... ) that contains Emojis aren't cached in the wp_postmeta
table, if the meta_value
column's charset is reported as utf8
and not utf8mb4
.
This means that wp_oembed_get()
is called for every such embed on each page load.
Here's an example of an Instagram containing an Emoji:
https://www.instagram.com/p/_cj9NXvhrh/
where only the:
_oembed_time_02ba39bcc45811ad672a288b772ca9a2
is written to the wp_postmeta
table but not
_oembed_02ba39bcc45811ad672a288b772ca9a2
containing the cached oEmbed content.
I just noticed this for a site with a shared database server with MySQL version < 5.5.3.
In the WP_Embed::shortcode()
method we have following:
// Maybe cache the result if ( $html ) { update_post_meta( $post_ID, $cachekey, $html ); update_post_meta( $post_ID, $cachekey_time, time() ); } elseif ( ! $cache ) { update_post_meta( $post_ID, $cachekey, '{{unknown}}' ); }
This doesn't take care of the situation when update_postmeta()
fails to write to the database.
For new meta inserts it uses:
$result = $wpdb->insert( $table, array( $column => $object_id, 'meta_key' => $meta_key, 'meta_value' => $meta_value ) );
This is a wrapper for:
wpdb::_insert_replace_helper()
that contains this part:
$data = $this->process_fields( $table, $data, $format ); if ( false === $data ) { return false; }
And similarly within wpdb::process_fields()
we have:
$converted_data = $this->strip_invalid_text( $data ); if ( $data !== $converted_data ) { return false; }
and this can return false
for some data conversions.
Example
We can test this with a simple example:
update_post_meta( 123, 'test', 'Smile ????' );
that returns false
if:
$wpdb->get_col_charset( $wpdb->postmeta, 'meta_value' )
returns 'utf8'
and not 'utf8mb4'
.
In this case nothing is written to the wp_postmeta
table.
Workaround
Here's what I constructed as a temporary workaround:
/** * Encode Emojis in all oembed results, if the charset for the meta_value column is utf8. */ add_filter( 'oembed_result', function( $html, $url, $args ) use ( $wpdb ) { /** * For WordPress 4.2+ where the get_col_charset method and wp_encode_emoji function where introduced. * The charset check is based on the https://codex.wordpress.org/Function_Reference/wp_encode_emoji#Example */ if ( method_exists( $wpdb, 'get_col_charset' ) && 'utf8' === $wpdb->get_col_charset( $wpdb->postmeta, 'meta_value' ) ) $html = wp_encode_emoji( $html ); return $html; }, 10, 3 );
Suggestions
I see two possibilities to rewrite this part:
// Maybe cache the result if ( $html ) { update_post_meta( $post_ID, $cachekey, $html ); update_post_meta( $post_ID, $cachekey_time, time() ); } elseif ( ! $cache ) { update_post_meta( $post_ID, $cachekey, '{{unknown}}' ); }
to fix the caching.
Suggestion 1)
// Maybe cache the result if ( $html ) { $updated_cachekey = update_post_meta( $post_ID, $cachekey, $html ); if( $updated_cachekey ) { update_post_meta( $post_ID, $cachekey_time, time() ); } else { update_post_meta( $post_ID, $cachekey, '{{unknown}}' ); } } elseif ( ! $cache ) { update_post_meta( $post_ID, $cachekey, '{{unknown}}' ); }
This means that only the link will show up if $updated_cachekey
is false
and the cache key for the time isn't written to the db as well.
We could also rewrite this to only contain a single update_post_meta()
call.
Suggestion 2)
We could also extend 1) to encode the Emojis if needed:
// Maybe cache the result if ( $html ) { // Encode Emojis if the charset of the meta_value column is 'utf8' if ( method_exists( $wpdb, 'get_col_charset' ) && 'utf8' === $wpdb->get_col_charset( $wpdb->postmeta, 'meta_value' ) && function_exists( 'wp_encode_emoji' ) ) { $html = wp_encode_emoji( $html ); } $updated_cachekey = update_post_meta( $post_ID, $cachekey, $html ); if( $updated_cachekey ) { update_post_meta( $post_ID, $cachekey_time, time() ); } else { update_post_meta( $post_ID, $cachekey, '{{unknown}}' ); } } elseif ( ! $cache ) { update_post_meta( $post_ID, $cachekey, '{{unknown}}' ); }
where we encode the Emojis when the meta_value column's charset is utf8. The method/function checks would be needed since they are only available in WP 4.2+.
This should give us the cached content instead of only the link.
PS: For the above case the Query Monitor plugin reports 4 HTTP GET requests for a single instagram oEmbed url, that contains an Emoji.
Example:
For https://www.instagram.com/p/_cj9NXvhrh/ they are:
https://api.instagram.com/oembed ?maxwidth=640 &maxheight=960 &url=https%3A%2F%2Fwww.instagram.com%2Fp%2F_cj9NXvhrh%2F &format=json https://api.instagram.com/publicapi/oembed/ ?maxwidth=640 &maxheight=960 &url=https%3A%2F%2Fwww.instagram.com%2Fp%2F_cj9NXvhrh%2F &format=json https://www.instagram.com/publicapi/oembed/ ?maxwidth=640 &maxheight=960 &url=https://www.instagram.com/p/_cj9NXvhrh/ &format=json https://api.instagram.com/oembed/ ?maxwidth=640 &maxheight=960 &url=https://www.instagram.com/p/_cj9NXvhrh/ &format=json
I'm not sure why but I only get a single request for a Twitter oEmbed containing Emojis.
PPS: Sorry for the long post.
Change History (9)
#2
@
9 years ago
We could additionally look at the $cache
value to see if $cachekey
already exists.
The two updated suggestions are now:
1)
// Maybe cache the result if ( $html ) { $updated_cachekey = update_post_meta( $post_ID, $cachekey, $html ); if( ! $updated_cachekey && ! $cache ) { update_post_meta( $post_ID, $cachekey, '{{unknown}}' ); } update_post_meta( $post_ID, $cachekey_time, time() ); } elseif ( ! $cache ) { update_post_meta( $post_ID, $cachekey, '{{unknown}}' ); }
or 2)
// Maybe cache the result if ( $html ) { global $wpdb; // Encode Emojis if the charset of the meta_value column is 'utf8' instead of 'utf8mb4' if ( method_exists( $wpdb, 'get_col_charset' ) && 'utf8' === $wpdb->get_col_charset( $wpdb->postmeta, 'meta_value' ) && function_exists( 'wp_encode_emoji' ) ) { $html = wp_encode_emoji( $html ); } $updated_cachekey = update_post_meta( $post_ID, $cachekey, $html ); if( ! $updated_cachekey && ! $cache ) { update_post_meta( $post_ID, $cachekey, '{{unknown}}' ); } update_post_meta( $post_ID, $cachekey_time, time() ); } elseif ( ! $cache ) { update_post_meta( $post_ID, $cachekey, '{{unknown}}' ); }
This ticket was mentioned in Slack in #core by jeremyfelt. View the logs.
9 years ago
#4
@
9 years ago
Any update on this?
These days almost everybody is using Emojis, meaning oembeds are not cached at all, making high load on web server constantly fetching embedded content.
@birgire temporary workaround works for some embeds, but doesn't work for e.g. https://www.instagram.com/p/BIaw9JUB-DK/
#5
follow-up:
↓ 6
@
9 years ago
@Krstarica, I think this is still problematic for non utf8mb4.
Just imagine a popular site, with an uncached frontpage, displaying full content posts, each containing multiple Instagram oEmbeds with Emojis ;-)
Note that the Instagram oEmbeds can contain other invalid chars than just Emojis, so I ended up using mb_convert_encoding()
, like mentioned here: http://stackoverflow.com/a/34640424/2078474, instead of using wp_encode_emoji()
.
Then JetPack complicated things further, because it's using it's own Instagram handler, if I remember correctly. I therefore had to unload it through the jetpack_shortcodes_to_include
filter, to be able to apply the mb_convert_encoding()
convertion through the oembed_result
filter.
Even if we strip the invalid chars from the fetched Instagram HTML, the javascript part will display it correctly.
#6
in reply to:
↑ 5
@
9 years ago
@birgire, we've ended up with manually converting Wordpress database, tables and config file to utf8mb4. This should have been done during upgrade process, but for some reason it didn't happen:
https://make.wordpress.org/core/2015/04/02/the-utf8mb4-upgrade/
#7
@
9 years ago
@Krstarica yes I would recommend upgrading to utf8mb4 whenever that's supported, but that's not always the case. Good to hear you were able to do that.
#9
@
6 years ago
- Resolution set to wontfix
- Status changed from new to closed
Given that only a tiny percentage of sites run a version of MySQL that doesn't support utf8mb4
, and that we'll likely bump the minimum required version of MySQL at some point in the not-too-distant future, I'm going to close this issue.
The correct fix is to convert the table to utf8mb4
.
ps: regarding the two suggestions above, it looks like I forgot that
update_post_meta
also returnsfalse
when trying to update the same existing meta value. So we need to take that into account as well ;-)