id,summary,reporter,owner,description,type,status,priority,milestone,component,version,severity,resolution,keywords,cc,focuses 6077,UTF-8 strings are sometimes cut in the middle of a character,nbachiyski,,"Using {{{substr}}} on UTF-8 strings can cause some characters to be cut on the middle, because {{{substr}}} counts bytes, but in UTF-8 a character can be more than one byte. Here is a patch, which: * Defines {{{mb_strcut}}} in {{{compat.php}} for the users, who don't have the {{{mb_string}}} extension. * Introduces a new {{{wp_html_excerpt}}} function, which uses {{{mb_strcut}}} and works well with html strings: counts entities as one character (& isn't 4 chars) and strips tags. There are some tests for the two functions: * [http://svn.automattic.com/wordpress-tests/wp-testcase/test_includes_compat.php _mb_strcut] * [http://svn.automattic.com/wordpress-tests/wp-testcase/test_includes_formatting.php wp_html_excerpt] (in the end of the file)",defect (bug),closed,normal,2.5,General,,normal,fixed,unicode utf-8 excerpt has-patch,,