Make WordPress Core

Opened 11 years ago

Closed 9 years ago

#21109 closed defect (bug) (wontfix)

maybe_unserialized fails to unserialize multibyte strings (contains solution)

Reported by: veedeezee's profile veedeezee Owned by:
Milestone: Priority: normal
Severity: normal Version: 3.4
Component: Options, Meta APIs Keywords: has-patch reporter-feedback
Focuses: Cc:

Description (last modified by ocean90)

I've came across a bug trying to fetch a custom theme (http://themeforest.net/item/rebirth-the-wordpress-theme-for-churches/1167055) option (see addendum #1). The value was UTF8 value and unserialization failed. DB values were imported by the means of SQL query.

I applied the following fix:
instead of:

function maybe_unserialize( $original ) {
	if ( is_serialized( $original ) )  // don't attempt to unserialize data that wasn't serialized going in
            return @unserialize( $original );
    
	return $original;
}

I had:

function maybe_unserialize( $original ) {
	if ( is_serialized( $original ) ) { // don't attempt to unserialize data that wasn't serialized going in
        // fix from: http://www.php.net/manual/en/function.unserialize.php#76012
        $out = preg_replace('!s:(\d+):"(.*?)";!se', "'s:'.strlen('$2').':\"$2\";'", $original );
        return @unserialize( $original );
    }
	return $original;
}

and it worked like a charm.

I also think that it may have some connection to mbstring settings in php.ini on the server.

addendum #1

(wp_options, option_id='rebirth' option_value='a:47:{s:18:"js_highlight_color";s:7:"#DB2A07";s:7:"js_logo";s:0:"";s:14:"js_logo_height";s:2:"70";s:19:"js_background_color";s:7:"#a8a3a3";s:19:"js_background_image";s:64:"http://hasulam.dev/wp-content/uploads/2011/12/bg-tile-dkgray.png";s:29:"js_background_image_alignment";s:8:"top left";s:26:"js_background_image_repeat";s:6:"repeat";s:20:"js_footer_text_style";s:4:"dark";s:25:"js_background_image_fixed";s:1:"0";s:10:"js_favicon";s:0:"";s:14:"js_custom_font";s:10:"FBBeeSerif";s:24:"js_disable_page_comments";s:0:"";s:24:"js_disable_post_comments";s:0:"";s:25:"js_disable_event_comments";s:0:"";s:27:"js_disable_gallery_comments";s:0:"";s:25:"js_disable_video_comments";s:0:"";s:25:"js_disable_audio_comments";s:0:"";s:24:"js_homepage_page_display";s:0:"";s:18:"js_homepage_blocks";a:5:{s:18:"js_homepage_slider";s:1:"0";s:22:"js_homepage_introblock";s:1:"0";s:12:"widget_block";s:1:"0";s:18:"js_event_countdown";s:1:"1";s:18:"js_footer_mapblock";s:1:"0";}s:23:"js_homepage_block_order";s:33:"slider,introblock,block,countdown";s:24:"js_homepage_slider_cycle";s:3:"yes";s:24:"js_homepage_slider_speed";s:4:"6000";s:35:"js_homepage_slider_transition_speed";s:3:"500";s:22:"js_homepage_intro_text";s:224:"קהילת הסולם — היא קהילה דתית עם לימוד העמוק והמדויק ביותר של חכמת הקבלה ופנימיות התורה על פי דרכו של בעל הסולם והרב"ש זצוק"ל";s:23:"js_social_icon_facebook";s:32:"http://www.facebook.com/hasulams";s:22:"js_social_icon_twitter";s:30:"https://twitter.com/#!/hasulam";s:21:"js_social_icon_flickr";s:42:"http://www.flickr.com/people/80922930@N07/";s:22:"js_social_icon_youtube";s:40:"http://www.youtube.com/user/hasulammedia";s:20:"js_social_icon_vimeo";s:0:"";s:25:"js_social_icon_foursquare";s:36:"https://foursquare.com/user/30915399";s:14:"js_time_format";s:3:"24h";s:21:"js_countdown_language";s:7:"english";s:21:"js_contactblock_title";s:18:"בואו לבקר!";s:20:"js_contactblock_text";s:295:"אנו מזמינים אתכם לבא ולבקר אותנו בבית מדרשנו ברמת גן. כאן מתפללים, לומדים שיעורי ם עם הרב ובחברותות, קורסים וסמינרים בשבת. נשמח לראותכם או לשמוע מכם דרך המשוב באתר.";s:23:"js_contactblock_address";s:33:"128 Jabotinsky, Ramat Gan, Israel";s:26:"js_contactblock_buttontext";s:17:"יצירת קשר";s:26:"js_contactblock_buttonlink";s:15:"/about/contact/";s:14:"js_footer_text";s:26:"Copyright ©[year] Hasulam";s:12:"js_gmap_zoom";s:2:"14";s:14:"js_hide_social";s:1:"0";s:22:"js_disable_breadcrumbs";s:1:"1";s:13:"js_audio_link";s:1:"0";s:13:"js_video_link";s:1:"0";s:18:"js_hide_contactmap";s:1:"0";s:21:"js_contact_form_email";s:22:"hasulam.site@gmail.com";s:14:"js_404_content";s:34:"צטערת, לא מצאתי הדף";s:19:"js_google_analytics";s:0:"";}')

Change History (7)

#1 @ocean90
11 years ago

  • Description modified (diff)

#2 follow-up: @SergeyBiryukov
11 years ago

Related: #6532, #6784, #9549

To clarify, preg_replace() is used to fix string lengths. This exact fix was suggested in ticket:6532:8.

I can only reproduce this if DB_CHARSET value in wp-config.php doesn't match the actual DB charset (e.g. latin1 vs. utf8_general_ci), as mentioned in ticket:6784:3.

With the fix, the strings are still unreadable due to charset differences, but at least it allows other data to be unserialized properly. Perhaps the strings would be readable under some circumstances, needs more investigating.

#3 in reply to: ↑ 2 ; follow-up: @veedeezee
11 years ago

Hello, Sergey.

This is exactly the case with the DB charset not matching the what is specified in wp_config.php, however please note the following:

  1. All Hebrew data is read and displayed perfectly by Wordpress
  2. The fix I'd applied leave Hebrew data perfectly readable

Replying to SergeyBiryukov:

Related: #6532, #6784, #9549

To clarify, preg_replace() is used to fix string lengths. This exact fix was suggested in ticket:6532:8.

I can only reproduce this if DB_CHARSET value in wp-config.php doesn't match the actual DB charset (e.g. latin1 vs. utf8_general_ci), as mentioned in ticket:6784:3.

With the fix, the strings are still unreadable due to charset differences, but at least it allows other data to be unserialized properly. Perhaps the strings would be readable under some circumstances, needs more investigating.

#4 in reply to: ↑ 3 @SergeyBiryukov
11 years ago

Also related: #3517, #8915

Replying to veedeezee:

This is exactly the case with the DB charset not matching the what is specified in wp_config.php

This doesn't seem like a valid configuration, event if the data is readable.

Perhaps we should try to figure out where the charset difference comes from. If that's an old install upgraded from 2.1 or earlier, before DB_CHARSET constant was introduced, that might explain it.

In that case, the more correct solution would be to convert the database to UTF-8:
http://codex.wordpress.org/Converting_Database_Character_Sets

#5 follow-up: @dReiska
10 years ago

This bug seems to be affecting Options API too. Database charset and table charset are both UTF-8. Tested with version 3.4.2.

Code to reproduce:

$a = array('test' => 'Jotain tähän');
update_option('test-test-test', $a);
var_dump($a, get_option('test-test-test'));
// Outputs: array(1) { ["test"]=> string(14) "Jotain tähän" } array(1) { ["test"]=> string(14) "Jotain tähän" } string(39) "a:1:{s:4:"test";s:14:"Jotain tähän";}"

and then remove update & run the code again to retrieve value from DB:

$a = array('test' => 'Jotain tähän');
var_dump($a, get_option('test-test-test'));
// Outputs: array(1) { ["test"]=> string(14) "Jotain tähän" } string(39) "a:1:{s:4:"test";s:14:"Jotain tähän";}"

Instead of an array, string of serialized array is returned. With non-UTF-8 characters this works very well, e.g. if the value of "test" is "Foo bar".

#6 in reply to: ↑ 5 @SergeyBiryukov
10 years ago

  • Keywords reporter-feedback added; dev-feedback removed

Replying to dReiska:

This bug seems to be affecting Options API too.

Related: #24180

Instead of an array, string of serialized array is returned.

Could not reproduce with your code. This is what I got:

array(1) { ["test"]=> string(14) "Jotain tähän" } array(1) { ["test"]=> string(14) "Jotain tähän" } 

Is the file with this code saved in UTF-8 (without byte order mark)?

#7 @nacin
9 years ago

  • Component changed from General to Options and Meta
  • Milestone Awaiting Review deleted
  • Resolution set to wontfix
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.