Opened 10 years ago
Closed 7 years ago
#36456 closed defect (bug) (wontfix)
oEmbeds containing Emojis aren't cached if the meta_value column's charset is utf8 and not utf8mb4
| Reported by: |
|
Owned by: | |
|---|---|---|---|
| Milestone: | Priority: | normal | |
| Severity: | normal | Version: | 4.4.2 |
| Component: | Emoji | Keywords: | |
| Focuses: | Cc: |
Description
oEmbeds (e.g. Instagram, Twitter, ... ) that contains Emojis aren't cached in the wp_postmeta table, if the meta_value column's charset is reported as utf8 and not utf8mb4.
This means that wp_oembed_get() is called for every such embed on each page load.
Here's an example of an Instagram containing an Emoji:
https://www.instagram.com/p/_cj9NXvhrh/
where only the:
_oembed_time_02ba39bcc45811ad672a288b772ca9a2
is written to the wp_postmeta table but not
_oembed_02ba39bcc45811ad672a288b772ca9a2
containing the cached oEmbed content.
I just noticed this for a site with a shared database server with MySQL version < 5.5.3.
In the WP_Embed::shortcode() method we have following:
// Maybe cache the result
if ( $html ) {
update_post_meta( $post_ID, $cachekey, $html );
update_post_meta( $post_ID, $cachekey_time, time() );
} elseif ( ! $cache ) {
update_post_meta( $post_ID, $cachekey, '{{unknown}}' );
}
This doesn't take care of the situation when update_postmeta() fails to write to the database.
For new meta inserts it uses:
$result = $wpdb->insert( $table, array(
$column => $object_id,
'meta_key' => $meta_key,
'meta_value' => $meta_value
) );
This is a wrapper for:
wpdb::_insert_replace_helper()
that contains this part:
$data = $this->process_fields( $table, $data, $format );
if ( false === $data ) {
return false;
}
And similarly within wpdb::process_fields() we have:
$converted_data = $this->strip_invalid_text( $data );
if ( $data !== $converted_data ) {
return false;
}
and this can return false for some data conversions.
Example
We can test this with a simple example:
update_post_meta( 123, 'test', 'Smile ????' );
that returns false if:
$wpdb->get_col_charset( $wpdb->postmeta, 'meta_value' )
returns 'utf8' and not 'utf8mb4'.
In this case nothing is written to the wp_postmeta table.
Workaround
Here's what I constructed as a temporary workaround:
/**
* Encode Emojis in all oembed results, if the charset for the meta_value column is utf8.
*/
add_filter( 'oembed_result', function( $html, $url, $args ) use ( $wpdb )
{
/**
* For WordPress 4.2+ where the get_col_charset method and wp_encode_emoji function where introduced.
* The charset check is based on the https://codex.wordpress.org/Function_Reference/wp_encode_emoji#Example
*/
if ( method_exists( $wpdb, 'get_col_charset' )
&& 'utf8' === $wpdb->get_col_charset( $wpdb->postmeta, 'meta_value' )
)
$html = wp_encode_emoji( $html );
return $html;
}, 10, 3 );
Suggestions
I see two possibilities to rewrite this part:
// Maybe cache the result
if ( $html ) {
update_post_meta( $post_ID, $cachekey, $html );
update_post_meta( $post_ID, $cachekey_time, time() );
} elseif ( ! $cache ) {
update_post_meta( $post_ID, $cachekey, '{{unknown}}' );
}
to fix the caching.
Suggestion 1)
// Maybe cache the result
if ( $html ) {
$updated_cachekey = update_post_meta( $post_ID, $cachekey, $html );
if( $updated_cachekey ) {
update_post_meta( $post_ID, $cachekey_time, time() );
} else {
update_post_meta( $post_ID, $cachekey, '{{unknown}}' );
}
} elseif ( ! $cache ) {
update_post_meta( $post_ID, $cachekey, '{{unknown}}' );
}
This means that only the link will show up if $updated_cachekey is false and the cache key for the time isn't written to the db as well.
We could also rewrite this to only contain a single update_post_meta() call.
Suggestion 2)
We could also extend 1) to encode the Emojis if needed:
// Maybe cache the result
if ( $html ) {
// Encode Emojis if the charset of the meta_value column is 'utf8'
if ( method_exists( $wpdb, 'get_col_charset' )
&& 'utf8' === $wpdb->get_col_charset( $wpdb->postmeta, 'meta_value' )
&& function_exists( 'wp_encode_emoji' )
) {
$html = wp_encode_emoji( $html );
}
$updated_cachekey = update_post_meta( $post_ID, $cachekey, $html );
if( $updated_cachekey ) {
update_post_meta( $post_ID, $cachekey_time, time() );
} else {
update_post_meta( $post_ID, $cachekey, '{{unknown}}' );
}
} elseif ( ! $cache ) {
update_post_meta( $post_ID, $cachekey, '{{unknown}}' );
}
where we encode the Emojis when the meta_value column's charset is utf8. The method/function checks would be needed since they are only available in WP 4.2+.
This should give us the cached content instead of only the link.
PS: For the above case the Query Monitor plugin reports 4 HTTP GET requests for a single instagram oEmbed url, that contains an Emoji.
Example:
For https://www.instagram.com/p/_cj9NXvhrh/ they are:
https://api.instagram.com/oembed ?maxwidth=640 &maxheight=960 &url=https%3A%2F%2Fwww.instagram.com%2Fp%2F_cj9NXvhrh%2F &format=json https://api.instagram.com/publicapi/oembed/ ?maxwidth=640 &maxheight=960 &url=https%3A%2F%2Fwww.instagram.com%2Fp%2F_cj9NXvhrh%2F &format=json https://www.instagram.com/publicapi/oembed/ ?maxwidth=640 &maxheight=960 &url=https://www.instagram.com/p/_cj9NXvhrh/ &format=json https://api.instagram.com/oembed/ ?maxwidth=640 &maxheight=960 &url=https://www.instagram.com/p/_cj9NXvhrh/ &format=json
I'm not sure why but I only get a single request for a Twitter oEmbed containing Emojis.
PPS: Sorry for the long post.
Change History (9)
#2
@
10 years ago
We could additionally look at the $cache value to see if $cachekey already exists.
The two updated suggestions are now:
1)
// Maybe cache the result
if ( $html ) {
$updated_cachekey = update_post_meta( $post_ID, $cachekey, $html );
if( ! $updated_cachekey && ! $cache ) {
update_post_meta( $post_ID, $cachekey, '{{unknown}}' );
}
update_post_meta( $post_ID, $cachekey_time, time() );
} elseif ( ! $cache ) {
update_post_meta( $post_ID, $cachekey, '{{unknown}}' );
}
or 2)
// Maybe cache the result
if ( $html ) {
global $wpdb;
// Encode Emojis if the charset of the meta_value column is 'utf8' instead of 'utf8mb4'
if ( method_exists( $wpdb, 'get_col_charset' )
&& 'utf8' === $wpdb->get_col_charset( $wpdb->postmeta, 'meta_value' )
&& function_exists( 'wp_encode_emoji' )
) {
$html = wp_encode_emoji( $html );
}
$updated_cachekey = update_post_meta( $post_ID, $cachekey, $html );
if( ! $updated_cachekey && ! $cache ) {
update_post_meta( $post_ID, $cachekey, '{{unknown}}' );
}
update_post_meta( $post_ID, $cachekey_time, time() );
} elseif ( ! $cache ) {
update_post_meta( $post_ID, $cachekey, '{{unknown}}' );
}
This ticket was mentioned in Slack in #core by jeremyfelt. View the logs.
10 years ago
#4
@
9 years ago
Any update on this?
These days almost everybody is using Emojis, meaning oembeds are not cached at all, making high load on web server constantly fetching embedded content.
@birgire temporary workaround works for some embeds, but doesn't work for e.g. https://www.instagram.com/p/BIaw9JUB-DK/
#5
follow-up:
↓ 6
@
9 years ago
@Krstarica, I think this is still problematic for non utf8mb4.
Just imagine a popular site, with an uncached frontpage, displaying full content posts, each containing multiple Instagram oEmbeds with Emojis ;-)
Note that the Instagram oEmbeds can contain other invalid chars than just Emojis, so I ended up using mb_convert_encoding(), like mentioned here: http://stackoverflow.com/a/34640424/2078474
Then JetPack complicated things further, because it's using it's own Instagram handler, if I remember correctly. I therefore had to unload it through the jetpack_shortcodes_to_include filter, to be able to apply the mb_convert_encoding() to fix through the oembed_result filter.
Even if we strip the invalid chars from the fetched Instagram HTML, the javascript part will display it correctly.
#6
in reply to:
↑ 5
@
9 years ago
@birgire, we've ended up with manually converting Wordpress database, tables and config file to utf8mb4. This should have been done during upgrade process, but for some reason it didn't happen:
https://make.wordpress.org/core/2015/04/02/the-utf8mb4-upgrade/
#7
@
9 years ago
@Krstarica yes I would recommend upgrading to utf8mb4 whenever that's supported, but that's not always the case. Good to hear you were able to do that.
#9
@
7 years ago
- Resolution set to wontfix
- Status changed from new to closed
Given that only a tiny percentage of sites run a version of MySQL that doesn't support utf8mb4, and that we'll likely bump the minimum required version of MySQL at some point in the not-too-distant future, I'm going to close this issue.
The correct fix is to convert the table to utf8mb4.
ps: regarding the two suggestions above, it looks like I forgot that
update_post_metaalso returnsfalsewhen trying to update the same existing meta value. So we need to take that into account as well ;-)