UTF-8 characters truncated mid-byte sequence in excerpt in RSS2 feed
|Reported by:||kurtmckee||Owned by:|
I received a bug report at a project I maintain and discovered what appears to be a bug in Wordpress 3.2.1.
The trouble is that the description element is being truncated in the middle of a UTF-8 multibyte character, which is producing garbage binary data. An example can be found at:
I downloaded the site's theme but found nothing that would affect post_excerpt or the_excerpt_rss. I then downloaded Wordpress trunk and attempted to figure out where the problem might be, but I'm unfamiliar with the Wordpress source and couldn't find anything after tracing through multiple files using grep.
I did discover that trackback_url_list() in wp-includes/post.php appears to be using a simple substr() call that might cause problems with multibyte characters. However, I'm more concerned with the potential for malformed feeds.
I've included a copy of the feed XML in question for longevity.