Make WordPress Core

Opened 10 years ago

Closed 10 years ago

#31262 closed defect (bug) (wontfix)

Tests_DB_Charset failures

Reported by: sergeybiryukov's profile SergeyBiryukov Owned by: pento's profile pento
Milestone: Priority: normal
Severity: normal Version: 4.2
Component: Database Keywords:
Focuses: Cc:

Description

Background: #21212

Seeing two failures in current trunk running phpunit --group wpdb on PHP 5.2.17, MySQL 5.0.51a:

There were 2 failures:

1) Tests_DB_Charset::test_strip_invalid_text with data set #6 (array(array('hebrew', 'ùord÷ress', true)), array(array('hebrew', 'ùord÷ress', true)), 'hebrew')
hebrew
Failed asserting that Array (
    0 => Array (
        'charset' => 'hebrew'
        'value' => '?ord?ress'
        'db' => true
    )
) is identical to Array (
    0 => Array (
        'charset' => 'hebrew'
        'value' => 'ùord÷ress'
        'db' => true
    )
).

S:\home\wordpress\develop\tests\phpunit\tests\db\charset.php:129
S:\usr\local\php5\phpunit:46

2) Tests_DB_Charset::test_strip_invalid_text with data set #9 (array(array('latin1', '🎷'), array('ascii', 'Hello World'), array('utf8', 'H€llo😈World¢'), array('utf8mb3', 'H€llo😈World¢'), array('utf8mb4', 'H€llo😈World¢'), array('koi8r', 'ýordòress', true), array('hebrew', 'ùord÷ress', true), array(false, 100), array('big5', 'a¦@b')), array(array('latin1', '🎷'), array('ascii', 'Hello World'), array('utf8', 'H€lloWorld¢'), array('utf8mb3', 'H€lloWorld¢'), array('utf8mb4', 'H€llo😈World¢'), array('koi8r', 'ýordòress', true), array('hebrew', 'ùord÷ress', true), array(false, 100), array('big5', 'a¦@b')), 'multiple fields/charsets')
multiple fields/charsets
Failed asserting that Array (
    0 => Array (
        'charset' => 'latin1'
        'value' => '🎷'
    )
    1 => Array (
        'charset' => 'ascii'
        'value' => 'Hello World'
    )
    2 => Array (
        'charset' => 'utf8'
        'value' => 'H€lloWorld¢'
    )
    3 => Array (
        'charset' => 'utf8mb3'
        'value' => 'H€lloWorld¢'
    )
    4 => Array (
        'charset' => 'utf8mb4'
        'value' => 'H€llo😈World¢'
    )
    5 => Array (
        'charset' => 'koi8r'
        'value' => 'ýordòress'
        'db' => true
    )
    6 => Array (
        'charset' => 'hebrew'
        'value' => '?ord?ress'
        'db' => true
    )
    7 => Array (
        'charset' => false
        'value' => 100
    )
    8 => Array (
        'charset' => 'big5'
        'value' => 'a¦@b'
    )
) is identical to Array (
    0 => Array (
        'charset' => 'latin1'
        'value' => '🎷'
    )
    1 => Array (
        'charset' => 'ascii'
        'value' => 'Hello World'
    )
    2 => Array (
        'charset' => 'utf8'
        'value' => 'H€lloWorld¢'
    )
    3 => Array (
        'charset' => 'utf8mb3'
        'value' => 'H€lloWorld¢'
    )
    4 => Array (
        'charset' => 'utf8mb4'
        'value' => 'H€llo😈World¢'
    )
    5 => Array (
        'charset' => 'koi8r'
        'value' => 'ýordòress'
        'db' => true
    )
    6 => Array (
        'charset' => 'hebrew'
        'value' => 'ùord÷ress'
        'db' => true
    )
    7 => Array (
        'charset' => false
        'value' => 100
    )
    8 => Array (
        'charset' => 'big5'
        'value' => 'a¦@b'
    )
)

Attachments (3)

my.cnf (6.3 KB) - added by SergeyBiryukov 10 years ago.
php.ini (45.8 KB) - added by SergeyBiryukov 10 years ago.
31262.diff (667 bytes) - added by pento 10 years ago.

Download all attachments as: .zip

Change History (16)

#1 @pento
10 years ago

It's behaving like the hebrew character set doesn't exist on that server. Could you try running this query, and see what it returns?

SHOW CHARACTER SET LIKE 'hebrew';

#2 @SergeyBiryukov
10 years ago

CharsetDescriptionDefault collationMaxlen
hebrewISO 8859-8 Hebrewhebrew_general_ci1

#3 @pento
10 years ago

  • Owner set to pento
  • Resolution set to fixed
  • Status changed from new to closed

In 31371:

WPDB: When we're removing invalid text text from strings with multiple different character sets, wpdb::strip_invalid_text() wasn't correctly switching connection character sets.

Fixes #31262

#4 @SergeyBiryukov
10 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Same environment, still seeing failures:

1) Tests_DB_Charset::test_strip_invalid_text with data set #5 (array(array('koi8r', 'ýordòress', true)), array(array('koi8r', 'ýordòress', true)), 'koi8r')
koi8r
Failed asserting that Array (
    0 => Array (
        'charset' => 'koi8r'
        'value' => '?ord?ress'
        'db' => true
    )
) is identical to Array (
    0 => Array (
        'charset' => 'koi8r'
        'value' => 'ýordòress'
        'db' => true
    )
).

S:\home\wordpress\develop\tests\phpunit\tests\db\charset.php:126
S:\usr\local\php5\phpunit:46

2) Tests_DB_Charset::test_strip_invalid_text with data set #6 (array(array('hebrew', 'ùord÷ress', true)), array(array('hebrew', 'ùord÷ress', true)), 'hebrew')
hebrew
Failed asserting that Array (
    0 => Array (
        'charset' => 'hebrew'
        'value' => '?ord?ress'
        'db' => true
    )
) is identical to Array (
    0 => Array (
        'charset' => 'hebrew'
        'value' => 'ùord÷ress'
        'db' => true
    )
).

S:\home\wordpress\develop\tests\phpunit\tests\db\charset.php:126
S:\usr\local\php5\phpunit:46

3) Tests_DB_Charset::test_strip_invalid_text with data set #9 (array(array('latin1', '🎷'), array('ascii', 'Hello World'), array('utf8', 'H€llo😈World¢'), array('utf8mb3', 'H€llo😈World¢'), array('utf8mb4', 'H€llo😈World¢'), array('koi8r', 'ýordòress', true), array('hebrew', 'ùord÷ress', true), array(false, 100), array('big5', 'a¦@b')), array(array('latin1', '🎷'), array('ascii', 'Hello World'), array('utf8', 'H€lloWorld¢'), array('utf8mb3', 'H€lloWorld¢'), array('utf8mb4', 'H€llo😈World¢'), array('koi8r', 'ýordòress', true), array('hebrew', 'ùord÷ress', true), array(false, 100), array('big5', 'a¦@b')), 'multiple fields/charsets')
multiple fields/charsets
Failed asserting that Array (
    0 => Array (
        'charset' => 'latin1'
        'value' => '🎷'
    )
    1 => Array (
        'charset' => 'ascii'
        'value' => 'Hello World'
    )
    2 => Array (
        'charset' => 'utf8'
        'value' => 'H€lloWorld¢'
    )
    3 => Array (
        'charset' => 'utf8mb3'
        'value' => 'H€lloWorld¢'
    )
    4 => Array (
        'charset' => 'utf8mb4'
        'value' => 'H€llo😈World¢'
    )
    5 => Array (
        'charset' => 'koi8r'
        'value' => '?ord?ress'
        'db' => true
    )
    6 => Array (
        'charset' => 'hebrew'
        'value' => '?ord?ress'
        'db' => true
    )
    7 => Array (
        'charset' => false
        'value' => 100
    )
    8 => Array (
        'charset' => 'big5'
        'value' => 'a¦@b'
    )
) is identical to Array (
    0 => Array (
        'charset' => 'latin1'
        'value' => '🎷'
    )
    1 => Array (
        'charset' => 'ascii'
        'value' => 'Hello World'
    )
    2 => Array (
        'charset' => 'utf8'
        'value' => 'H€lloWorld¢'
    )
    3 => Array (
        'charset' => 'utf8mb3'
        'value' => 'H€lloWorld¢'
    )
    4 => Array (
        'charset' => 'utf8mb4'
        'value' => 'H€llo😈World¢'
    )
    5 => Array (
        'charset' => 'koi8r'
        'value' => 'ýordòress'
        'db' => true
    )
    6 => Array (
        'charset' => 'hebrew'
        'value' => 'ùord÷ress'
        'db' => true
    )
    7 => Array (
        'charset' => false
        'value' => 100
    )
    8 => Array (
        'charset' => 'big5'
        'value' => 'a¦@b'
    )
)

#5 follow-up: @pento
10 years ago

I've been been playing around with this a bit more, but I've been totally unable to reproduce it.

@SergeyBiryukov, could you post your my.cnf and php.ini files? I'll see if that helps me get to the bottom of it.

#6 in reply to: ↑ 5 @SergeyBiryukov
10 years ago

Replying to pento:

@SergeyBiryukov, could you post your my.cnf and php.ini files? I'll see if that helps me get to the bottom of it.

Sure, attached.

@SergeyBiryukov
10 years ago

@SergeyBiryukov
10 years ago

#7 follow-up: @pento
10 years ago

I've been beating my head against this all week, with still no luck.

Given the output, I'm starting to suspect this is a Windows-specific issue. For example, take the first character in the koi8r string. It has the byte value of \xfd, which is an invalid character in UTF-8, but ý in UTF-16 (as is shown in the $actual output). Given that Windows uses UTF-16 as its internal encoding, I suspect the string is being silently marked as UTF-16 at some point, either in PHP or MySQL.

Are you able to reproduce this in later versions of PHP and MySQL?

#8 in reply to: ↑ 7 @SergeyBiryukov
10 years ago

Replying to pento:

Are you able to reproduce this in later versions of PHP and MySQL?

Could not reproduce with PHP 5.3.28 or 5.4.29 on the same environment.

Got a bunch of taxonomy test failures though, see #31827.

#9 follow-up: @pento
10 years ago

I'm heavily leaning towards wontfixing this bug. It's not a regression, it's just new tests that don't pass under these circumstances. If anyone is actually running into this bug, WP 4.2 won't cause changes to how their site behaves.

@boonebgorges, is there a preferred method for marking a test to be skipped by PHP version and OS?

@pento
10 years ago

#10 @pento
10 years ago

  • Keywords has-patch commit added

31262.diff skips this test on Windows/PHP 5.2.

#11 in reply to: ↑ 9 @boonebgorges
10 years ago

Replying to pento:

@boonebgorges, is there a preferred method for marking a test to be skipped by PHP version and OS?

When skipping an entire file based on PHP version alone, put it in phpunit.xml: https://core.trac.wordpress.org/browser/tags/4.1.1/phpunit.xml.dist?marks=14,15,16,17#L6 But PHPUnit doesn't have support for test-specific exclusions in the config file, and it doesn't support skipping by OS at all, so [31262.diff] looks good to me.

#12 @pento
10 years ago

In 31953:

WPDB: Due to PHP 5.2's internal string handling, strings in Windows are encoded using UTF-16, instead of UTF-8. With the addition of the many character set tests in [30345], a couple of them were tripping up in PHP 5.2 under Windows, because of this behaviour.

This marks those tests as skipped.

See #31262 for more discussion.

#13 @pento
10 years ago

  • Keywords has-patch commit removed
  • Milestone 4.2 deleted
  • Resolution set to wontfix
  • Status changed from reopened to closed

Given that we haven't changed existing behaviour in PHP 5.2 under Windows, let's not try to fix PHP's behaviour.

Note: See TracTickets for help on using tickets.