Make WordPress Core

Opened 11 years ago

Closed 11 years ago

Last modified 9 years ago

#27961 closed defect (bug) (duplicate)

Twitter auto-embed fails if tweet contains Emoji icon

Reported by: mortimercat's profile MortimerCat Owned by:
Milestone: Priority: normal
Severity: normal Version: 3.9
Component: Embeds Keywords:
Focuses: Cc:

Description

A little bit of history - I struggled to embed a specific tweet back in Sept 2012. All plugins I tried failed except "Awesome Tweet Embedr". Today I discovered that the same tweet fails to load in my new blog and the "Awesome Tweet Embedr" seems to have vanished. I investigated further and found the root cause.

I have a fully updated multi-site system.

Below is a sample tweet from my personal twitter. (generated by http://www.iemoji.com/ )

https://twitter.com/MortimerCatTwit/status/458345511951409152

I add it to my blog as a single line (nb it also fails if I use [embed}).

When I view the post, it works. The emoji icons are replaced with a small box, but that is not a major issue.

When I refresh the page, or whenever I look at the page again, all that appears is the tweet text up to the first icon. All the formatting has disappeared.

Here is my reproduction of the problem. It appeared as expected on the first viewing, but is broken on subsequent viewings.
http://yeogle.com/2014/04/21/testing-emoji-wordpress/

Looking at the HTML page source, I can see that whatever function Wordpress uses to process the tweet, it gives up at the first icon.

The immediate fix would be for Wordpress to display a correctly formatted tweet - displaying the Emoji icon is not a priority.

Whilst researching the problem, I have seen others on various forums asking about non-working tweet. They normally get a response of "its working for me". I suspect that this intermittent problem will explain some of those.

Change History (9)

#1 follow-up: @Viper007Bond
11 years ago

  • Keywords needs-patch added

The emoji icons are replaced with a small box, but that is not a major issue.

This is a browser issue. They work fine for me in Firefox.

Anyway, I'm not actually sure this is an embeds bug. If you go look in the post meta for the row that is being used to cache this, everything after the first emoji is missing from the database.

It works on the first view because it's being pulled directly from Twitter. On the second page load, it's being pulled from the cache in the post meta, which is chopped off.

This might be a post meta bug.

Last edited 11 years ago by Viper007Bond (previous) (diff)

#2 in reply to: ↑ 1 @johnbillion
11 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to duplicate
  • Status changed from new to closed

Replying to Viper007Bond:

This might be a post meta bug.

I've researched this before and the core issue is that emojis are represented by four bytes, but MySQL's standard UTF-8 character sets can't handle four byte characters. This results in serialized data in the database being corrupt if it contains an emoji.

The correct solution is to switch to a utf8mb4 character set. This is covered in #21212.

The ugly workaround for plugins interacting with the Twitter API (good luck) is to base64 encode tweets before storing them in the database, rather than relying on WordPress' built-in serialization and unserialization.

Closing as a dupe for now, however it would be great to reopen this to address is separately if anyone has any good ideas which don't overlap with #21212.

#3 @SergeyBiryukov
11 years ago

  • Keywords needs-patch removed

Related: #13590

A workaround:

function replace_4byte_characters_callback( $match ) {
	return ( strlen( $match[0] ) < 4 ) ? $match[0] : '';
}

function replace_4byte_characters_27961( $output ) {
	// https://core.trac.wordpress.org/ticket/27961
	return preg_replace_callback( '/./u', 'replace_4byte_characters_callback', $output );
}
add_filter( 'oembed_result', 'replace_4byte_characters_27961' );

#4 @SergeyBiryukov
11 years ago

  • Focuses ui removed

#5 @MortimerCat
11 years ago

Thanks, the workaround works. As you all say, the problem occurs when the data is stored so my original demo post still shows the problem. I will leave it broken for historical purposes, but I am sure an edit would cure it. My latest post works fine...
http://yeogle.com/2014/04/23/testing-emoji-wordpress-workaround/

#6 @jeremyclarke
10 years ago

Note that this can also apply to post_content if a tweet is embedded directly in a post (as we should all do if we want the fallback blockquote to keep working after Twitter is gone), thus cutting off the entire post after the emoji.

To fix that you'll want to also filter wp_insert_post_data, something like:

add_filter( 'wp_insert_post_data', 'filter_insert_post_data_utf8mb4_27961', '', 2 );

While testing it seemed to work by just applying 'replace_4byte_characters_27961' to wp_insert_post_data, but just in case I made a callback specifically for wp_insert_post_data which only filtered post_content.

#7 follow-up: @LindsayBSC
10 years ago

Hi there,
I wanted to touch back on this issue specifically because it shows as "closed" yet the issue still remains a month later.

I recently found that embedding any tweet with an emoji breaks the tweet as seen in the example above. This particular ticket was registered as a dupe of #212121 but that ticket seems to be about storing emojis in the DB. The twitter embed does not store the tweet's content in the DB but simply the URL to the tweet (when the tweet is pasted in the post content area). I feel that this should remain an open issue until it is figured out since it's not a duplicate.

I would like to investigate this further if anyone has any direction for me to go in.

#8 in reply to: ↑ 7 @dd32
10 years ago

@LindsayBSC

I wanted to touch back on this issue specifically because it shows as "closed" yet the issue still remains a month later.

#21212 is fixed in 4.2, so 4.2 will fix this issue.

#9 @walterbarcelos
9 years ago

Hi there.

This issue is still happening in 4.4.1. When we have several embed tweets in a post and some of them have emojis, all the tweets display as blockquotes. If I remove the tweets with emojis, the other tweets display as expected. It's affecting several of our sites.

Thanks in advance.

UPDATE:

This issue is not in the core, as I thought initially. It's related to HyperDB. One of my colleagues could find a solution which is being tested now.

Here is the information if someone is interested: https://wordpress.org/support/topic/utf8mb4-wp-42-not-supported?replies=1

Last edited 9 years ago by walterbarcelos (previous) (diff)
Note: See TracTickets for help on using tickets.