Make WordPress Core

Opened 15 years ago

Closed 13 years ago

#12251 closed defect (bug) (worksforme)

mb_substr() works strangely in some environment.

Reported by: cyrus-h's profile Cyrus H. Owned by: hakre's profile hakre
Milestone: Priority: high
Severity: normal Version: 2.9.2
Component: Charset Keywords:
Focuses: Cc:

Description

http://wordpress.org/support/topic/357562

First of all, this is not a P2 theme bug; this is WP core bug.
I use English WordPress, but I post a post in Korean.

Summarizing above link: mb_substr() and mb_strlen() shows malfunction when dealing with non-English characters, because encoding parameter is not specified. This happens when PHP MB extension is enabled, because backward-compatibility function _mb_substr() automatically assumes the encoding as UTF-8 but the extension does not.

Following should be adopted to 2.9.3:

FILE: /wp-admin/includes/plugin-install.php, line 332
FROM:
$description = mb_substr($description, 0, 400) . '…';
TO:
$description = mb_substr($description, 0, 400, 'UTF-8') . '…';

FILE: /wp-admin/includes/post.php, line 1037
FROM:
if ( mb_strlen($post_name) > 30 ) {
TO:
if ( mb_strlen($post_name, 'UTF-8') > 30 ) {

FILE: /wp-admin/includes/post.php, line 1038
FROM:
$post_name_abridged = mb_substr($post_name, 0, 14). '…' . mb_substr($post_name, -14);
TO:
$post_name_abridged = mb_substr($post_name, 0, 14, 'UTF-8'). '…' . mb_substr($post_name, -14, 14, 'UTF-8');

FILE: /wp-includes/formatting.php, line 2708
FROM:
$str = mb_substr( $str, 0, $count );
TO:
$str = mb_substr( $str, 0, $count, 'UTF-8' );

Personally, this is very inconvenient: it gives me stress of seeing Unicode Replacement Character.

Change History (12)

#1 @Cyrus H.
15 years ago

  • Component changed from i18n to Charset
  • Owner changed from nbachiyski to hakre

#2 @toscho
15 years ago

  • Cc toscho added

I’d rather see something like this very early in WP:

if ( extension_loaded('mbstring') )
{
    mb_internal_encoding('UTF-8');
    mb_language('uni');
}

#3 @nacin
15 years ago

This is not a regression. I'm moving it to 3.0 for now. Any commits can be backported to the 2.9 and 2.8 branches if desired.

#4 @nacin
15 years ago

  • Milestone changed from 2.9.3 to 3.0

#5 @dd32
15 years ago

  • Milestone changed from 3.0 to 3.1

using mb_internal_encoding sounds like a good option to me.. But I dont know enough about character sets and i'm not in a position to test it well enough.

I'm moving to 3.1 as its not something I would feel comfortable changing close to the end of a dev cycle. If someone who understands the issues at heart here, and can test and take it on, I'm all for this angle however.

#6 @SergeyBiryukov
15 years ago

WordPress 3.0 already uses mb_internal_encoding() in wp-includes/load.php. This ticket is probably fixed now.

#7 @SergeyBiryukov
15 years ago

  • Cc sergeybiryukov.ru@… added

#8 @nacin
15 years ago

  • Keywords reporter-feedback added
  • Milestone changed from Awaiting Triage to Awaiting Review

wp_set_internal_encoding() is new, but the code isn't.

#9 @filosofo
14 years ago

  • Milestone changed from Awaiting Review to Future Release

#10 @hakre
13 years ago

  • Keywords reporter-feedback removed
  • Resolution set to fixed
  • Status changed from new to closed

I think this issue can be closed as fixed. In case this is still reproduceable, feel free to re-open.

#11 @SergeyBiryukov
13 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

mb_internal_encoding() was added in [7140], two years before this ticket was created:
http://core.trac.wordpress.org/browser/tags/2.5/wp-settings.php#L354

I guess "worksforme" would be a proper resolution.

#12 @SergeyBiryukov
13 years ago

  • Milestone Future Release deleted
  • Resolution set to worksforme
  • Status changed from reopened to closed
Note: See TracTickets for help on using tickets.