Make WordPress Core

Opened 9 years ago

Closed 9 years ago

#31262 closed defect (bug) (wontfix)

Tests_DB_Charset failures

Reported by: sergeybiryukov's profile SergeyBiryukov Owned by: pento's profile pento
Milestone: Priority: normal
Severity: normal Version: 4.2
Component: Database Keywords:
Focuses: Cc:

Description

Background: #21212

Seeing two failures in current trunk running phpunit --group wpdb on PHP 5.2.17, MySQL 5.0.51a:

There were 2 failures:

1) Tests_DB_Charset::test_strip_invalid_text with data set #6 (array(array('hebrew', 'ùord÷ress', true)), array(array('hebrew', 'ùord÷ress', true)), 'hebrew')
hebrew
Failed asserting that Array (
    0 => Array (
        'charset' => 'hebrew'
        'value' => '?ord?ress'
        'db' => true
    )
) is identical to Array (
    0 => Array (
        'charset' => 'hebrew'
        'value' => 'ùord÷ress'
        'db' => true
    )
).

S:\home\wordpress\develop\tests\phpunit\tests\db\charset.php:129
S:\usr\local\php5\phpunit:46

2) Tests_DB_Charset::test_strip_invalid_text with data set #9 (array(array('latin1', '🎷'), array('ascii', 'Hello World'), array('utf8', 'H€llo😈World¢'), array('utf8mb3', 'H€llo😈World¢'), array('utf8mb4', 'H€llo😈World¢'), array('koi8r', 'ýordòress', true), array('hebrew', 'ùord÷ress', true), array(false, 100), array('big5', 'a¦@b')), array(array('latin1', '🎷'), array('ascii', 'Hello World'), array('utf8', 'H€lloWorld¢'), array('utf8mb3', 'H€lloWorld¢'), array('utf8mb4', 'H€llo😈World¢'), array('koi8r', 'ýordòress', true), array('hebrew', 'ùord÷ress', true), array(false, 100), array('big5', 'a¦@b')), 'multiple fields/charsets')
multiple fields/charsets
Failed asserting that Array (
    0 => Array (
        'charset' => 'latin1'
        'value' => '🎷'
    )
    1 => Array (
        'charset' => 'ascii'
        'value' => 'Hello World'
    )
    2 => Array (
        'charset' => 'utf8'
        'value' => 'H€lloWorld¢'
    )
    3 => Array (
        'charset' => 'utf8mb3'
        'value' => 'H€lloWorld¢'
    )
    4 => Array (
        'charset' => 'utf8mb4'
        'value' => 'H€llo😈World¢'
    )
    5 => Array (
        'charset' => 'koi8r'
        'value' => 'ýordòress'
        'db' => true
    )
    6 => Array (
        'charset' => 'hebrew'
        'value' => '?ord?ress'
        'db' => true
    )
    7 => Array (
        'charset' => false
        'value' => 100
    )
    8 => Array (
        'charset' => 'big5'
        'value' => 'a¦@b'
    )
) is identical to Array (
    0 => Array (
        'charset' => 'latin1'
        'value' => '🎷'
    )
    1 => Array (
        'charset' => 'ascii'
        'value' => 'Hello World'
    )
    2 => Array (
        'charset' => 'utf8'
        'value' => 'H€lloWorld¢'
    )
    3 => Array (
        'charset' => 'utf8mb3'
        'value' => 'H€lloWorld¢'
    )
    4 => Array (
        'charset' => 'utf8mb4'
        'value' => 'H€llo😈World¢'
    )
    5 => Array (
        'charset' => 'koi8r'
        'value' => 'ýordòress'
        'db' => true
    )
    6 => Array (
        'charset' => 'hebrew'
        'value' => 'ùord÷ress'
        'db' => true
    )
    7 => Array (
        'charset' => false
        'value' => 100
    )
    8 => Array (
        'charset' => 'big5'
        'value' => 'a¦@b'
    )
)

Attachments (3)

my.cnf (6.3 KB) - added by SergeyBiryukov 9 years ago.
php.ini (45.8 KB) - added by SergeyBiryukov 9 years ago.
31262.diff (667 bytes) - added by pento 9 years ago.

Download all attachments as: .zip

Change History (16)

#1 @pento
9 years ago

It's behaving like the hebrew character set doesn't exist on that server. Could you try running this query, and see what it returns?

SHOW CHARACTER SET LIKE 'hebrew';

#2 @SergeyBiryukov
9 years ago

CharsetDescriptionDefault collationMaxlen
hebrewISO 8859-8 Hebrewhebrew_general_ci1

#3 @pento
9 years ago

  • Owner set to pento
  • Resolution set to fixed
  • Status changed from new to closed

In 31371:

WPDB: When we're removing invalid text text from strings with multiple different character sets, wpdb::strip_invalid_text() wasn't correctly switching connection character sets.

Fixes #31262

#4 @SergeyBiryukov
9 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Same environment, still seeing failures:

1) Tests_DB_Charset::test_strip_invalid_text with data set #5 (array(array('koi8r', 'ýordòress', true)), array(array('koi8r', 'ýordòress', true)), 'koi8r')
koi8r
Failed asserting that Array (
    0 => Array (
        'charset' => 'koi8r'
        'value' => '?ord?ress'
        'db' => true
    )
) is identical to Array (
    0 => Array (
        'charset' => 'koi8r'
        'value' => 'ýordòress'
        'db' => true
    )
).

S:\home\wordpress\develop\tests\phpunit\tests\db\charset.php:126
S:\usr\local\php5\phpunit:46

2) Tests_DB_Charset::test_strip_invalid_text with data set #6 (array(array('hebrew', 'ùord÷ress', true)), array(array('hebrew', 'ùord÷ress', true)), 'hebrew')
hebrew
Failed asserting that Array (
    0 => Array (
        'charset' => 'hebrew'
        'value' => '?ord?ress'
        'db' => true
    )
) is identical to Array (
    0 => Array (
        'charset' => 'hebrew'
        'value' => 'ùord÷ress'
        'db' => true
    )
).

S:\home\wordpress\develop\tests\phpunit\tests\db\charset.php:126
S:\usr\local\php5\phpunit:46

3) Tests_DB_Charset::test_strip_invalid_text with data set #9 (array(array('latin1', '🎷'), array('ascii', 'Hello World'), array('utf8', 'H€llo😈World¢'), array('utf8mb3', 'H€llo😈World¢'), array('utf8mb4', 'H€llo😈World¢'), array('koi8r', 'ýordòress', true), array('hebrew', 'ùord÷ress', true), array(false, 100), array('big5', 'a¦@b')), array(array('latin1', '🎷'), array('ascii', 'Hello World'), array('utf8', 'H€lloWorld¢'), array('utf8mb3', 'H€lloWorld¢'), array('utf8mb4', 'H€llo😈World¢'), array('koi8r', 'ýordòress', true), array('hebrew', 'ùord÷ress', true), array(false, 100), array('big5', 'a¦@b')), 'multiple fields/charsets')
multiple fields/charsets
Failed asserting that Array (
    0 => Array (
        'charset' => 'latin1'
        'value' => '🎷'
    )
    1 => Array (
        'charset' => 'ascii'
        'value' => 'Hello World'
    )
    2 => Array (
        'charset' => 'utf8'
        'value' => 'H€lloWorld¢'
    )
    3 => Array (
        'charset' => 'utf8mb3'
        'value' => 'H€lloWorld¢'
    )
    4 => Array (
        'charset' => 'utf8mb4'
        'value' => 'H€llo😈World¢'
    )
    5 => Array (
        'charset' => 'koi8r'
        'value' => '?ord?ress'
        'db' => true
    )
    6 => Array (
        'charset' => 'hebrew'
        'value' => '?ord?ress'
        'db' => true
    )
    7 => Array (
        'charset' => false
        'value' => 100
    )
    8 => Array (
        'charset' => 'big5'
        'value' => 'a¦@b'
    )
) is identical to Array (
    0 => Array (
        'charset' => 'latin1'
        'value' => '🎷'
    )
    1 => Array (
        'charset' => 'ascii'
        'value' => 'Hello World'
    )
    2 => Array (
        'charset' => 'utf8'
        'value' => 'H€lloWorld¢'
    )
    3 => Array (
        'charset' => 'utf8mb3'
        'value' => 'H€lloWorld¢'
    )
    4 => Array (
        'charset' => 'utf8mb4'
        'value' => 'H€llo😈World¢'
    )
    5 => Array (
        'charset' => 'koi8r'
        'value' => 'ýordòress'
        'db' => true
    )
    6 => Array (
        'charset' => 'hebrew'
        'value' => 'ùord÷ress'
        'db' => true
    )
    7 => Array (
        'charset' => false
        'value' => 100
    )
    8 => Array (
        'charset' => 'big5'
        'value' => 'a¦@b'
    )
)

#5 follow-up: @pento
9 years ago

I've been been playing around with this a bit more, but I've been totally unable to reproduce it.

@SergeyBiryukov, could you post your my.cnf and php.ini files? I'll see if that helps me get to the bottom of it.

#6 in reply to: ↑ 5 @SergeyBiryukov
9 years ago

Replying to pento:

@SergeyBiryukov, could you post your my.cnf and php.ini files? I'll see if that helps me get to the bottom of it.

Sure, attached.

@SergeyBiryukov
9 years ago

@SergeyBiryukov
9 years ago

#7 follow-up: @pento
9 years ago

I've been beating my head against this all week, with still no luck.

Given the output, I'm starting to suspect this is a Windows-specific issue. For example, take the first character in the koi8r string. It has the byte value of \xfd, which is an invalid character in UTF-8, but ý in UTF-16 (as is shown in the $actual output). Given that Windows uses UTF-16 as its internal encoding, I suspect the string is being silently marked as UTF-16 at some point, either in PHP or MySQL.

Are you able to reproduce this in later versions of PHP and MySQL?

#8 in reply to: ↑ 7 @SergeyBiryukov
9 years ago

Replying to pento:

Are you able to reproduce this in later versions of PHP and MySQL?

Could not reproduce with PHP 5.3.28 or 5.4.29 on the same environment.

Got a bunch of taxonomy test failures though, see #31827.

#9 follow-up: @pento
9 years ago

I'm heavily leaning towards wontfixing this bug. It's not a regression, it's just new tests that don't pass under these circumstances. If anyone is actually running into this bug, WP 4.2 won't cause changes to how their site behaves.

@boonebgorges, is there a preferred method for marking a test to be skipped by PHP version and OS?

@pento
9 years ago

#10 @pento
9 years ago

  • Keywords has-patch commit added

31262.diff skips this test on Windows/PHP 5.2.

#11 in reply to: ↑ 9 @boonebgorges
9 years ago

Replying to pento:

@boonebgorges, is there a preferred method for marking a test to be skipped by PHP version and OS?

When skipping an entire file based on PHP version alone, put it in phpunit.xml: https://core.trac.wordpress.org/browser/tags/4.1.1/phpunit.xml.dist?marks=14,15,16,17#L6 But PHPUnit doesn't have support for test-specific exclusions in the config file, and it doesn't support skipping by OS at all, so [31262.diff] looks good to me.

#12 @pento
9 years ago

In 31953:

WPDB: Due to PHP 5.2's internal string handling, strings in Windows are encoded using UTF-16, instead of UTF-8. With the addition of the many character set tests in [30345], a couple of them were tripping up in PHP 5.2 under Windows, because of this behaviour.

This marks those tests as skipped.

See #31262 for more discussion.

#13 @pento
9 years ago

  • Keywords has-patch commit removed
  • Milestone 4.2 deleted
  • Resolution set to wontfix
  • Status changed from reopened to closed

Given that we haven't changed existing behaviour in PHP 5.2 under Windows, let's not try to fix PHP's behaviour.

Note: See TracTickets for help on using tickets.