Opened 12 months ago
Last modified 4 weeks ago
#60842 new enhancement
PHPunit tests for wp_check_invalid_utf8
Reported by: |
|
Owned by: | |
---|---|---|---|
Milestone: | Awaiting Review | Priority: | normal |
Severity: | normal | Version: | |
Component: | Formatting | Keywords: | has-patch has-unit-tests |
Focuses: | tests | Cc: |
Description
Change History (15)
#1
@
5 months ago
- Component changed from Build/Test Tools to Formatting
- Keywords needs-patch needs-unit-tests added
- Type changed from defect (bug) to enhancement
This ticket was mentioned in PR #7547 on WordPress/wordpress-develop by @debarghyabanerjee.
5 months ago
#2
- Keywords has-patch has-unit-tests added; needs-patch needs-unit-tests removed
Trac Ticket: Core-60842
## Description:
- This pull request introduces a comprehensive suite of unit tests for the wp_check_invalid_utf8 function. These tests are designed to ensure the correct handling of both valid and invalid UTF-8 strings.
## Changes Made:
### Test Cases for Invalid UTF-8 Handling
- Added tests to validate that the function returns an empty string for improperly formed UTF-8 sequences.
- Included tests to ensure that valid UTF-8 strings remain unaffected when passed through the function.
### Assertions
- Utilized assertions to check that the function returns non-empty results when valid strings are processed.
- Added checks to ensure that the output does not contain any invalid UTF-8 characters after processing.
## Why These Tests Matter
- Ensures the robustness of the wp_check_invalid_utf8 function by validating its behavior against a variety of input scenarios.
- Helps to prevent regressions in future code changes that may affect character encoding handling.
- Improves overall code quality and reliability when working with UTF-8 data.
#3
@
5 months ago
While writing the test cases for wp_check_invalid_utf8
with $strip set to true, I found that if we pass any kind of Invalid UTF-8 character, it's throwing a Fatal Error, mostly because of this line
if ( $strip && function_exists( 'iconv' ) ) { return iconv( 'utf-8', 'utf-8', $text ); }
It was mentioned that this is particularly to remove the bad
characters, but passing an invalid UTF-8
character throws an error.
@hellofromTonya commented on PR #7547:
5 months ago
#5
Hello @Debarghya-Banerjee, thank you for creating this patch. A few tips to help prepare it for commit:
- Use data provider.
Instead of looping through the array of valid strings, please use a data provider.
Ref: Handbook page
Here's an example to help.
To include the inline comments of what each dataset (which then prints in the command line if the assertion fails), please use named dataset. Here's an example.
- Test method naming
test_should_[do what]_when_[conditions]
.
For example, test_valid_utf8()
becomes test_should_return_given_string_when_valid_without_stripping()
.
- Please use
assertSame()
instead ofassertEquals()
.
There's been an initiative underway to replace all assertEquals()
with assertSame()
.
- Missing datasets for testing what happens when stripping.
What happens when a string has a mix of valid and invalid utf-8 characters?
Let me know if you'd like help, as I can collaborate / contribute these datasets.
@debarghyabanerjee commented on PR #7547:
5 months ago
#6
Hi @hellofromtonya , sure, I am pushing the updated code in sometime. Thanks for the mention.
@debarghyabanerjee commented on PR #7547:
5 months ago
#7
Hi @hellofromtonya , I have updated the Unit Tests, added data providers and have renamed the methods. However I found this issue yesterday as well, while writing the unit test for wp_check_invalid_utf_8
with $strip
set to true
, if we pass an invalid UTF-8 character, it throws an error, this is probably because of this line:
// Attempt to strip the bad chars if requested (not recommended). if ( $strip && function_exists( 'iconv' ) ) { return iconv( 'utf-8', 'utf-8', $text ); }
Please let me know your thoughts on this. Thanks.
5 months ago
#8
Use data provider.
Instead of looping through the array of valid strings, please use a data provider.
Hmm, are data providers necessary for "single use" data? Seems not a good idea unless it makes the code more readable perhaps? Often it can make it a tiny bit less readable imho (have to scroll to look at the actual test and the data provider). The handbook properly suggest use of data providers for multiple uses of the same data, is this one of these cases?
@hellofromTonya commented on PR #7547:
5 months ago
#9
Use data provider.
Instead of looping through the array of valid strings, please use a data provider.
Hmm, are data providers necessary for "single use" data? Seems not a good idea unless it makes the code more readable perhaps? Often it can make it a tiny bit less readable imho (have to scroll to look at the actual test and the data provider). The handbook properly suggest use of data providers for multiple uses of the same data, is this one of these cases?
Thanks for asking these questions @azaozz. An opportunity to share why a data provider is prefered.
👉 Why is a data provider preferable over looping through a datasets array within the test method itself?
Data provider:
- All datasets are tested: All of the datasets within the array are tested, even if one or more of them fail the assertion(s) with the test method.
- Debugging is easier: When an assertion fails, the report indicates which dataset(s) failed.
Looping through a datasets array within the test method:
- Not all dataset elements are tested: If any of dataset with the array fails an assertion, the test exists and the remaining datasets are not tested.
- Debugging is harder: When a assertion fails, there's no indication of which dataset (i.e. array element) failed.
## Example:
Let's say I add an invalid string within the dataset.
array( array( 'Hello, world!' ), array( 'こんにちは' ), // Japanese for "Hello". 'invalid for demonstration purposes' => array( "\xC3\x28" ), // for demostration purposes. array( '3Привет мир' ), // Russian for "Hello, world". array( 'Café' ), // Contains an accented character. array( '✅' ), // Emoji. array( mb_convert_encoding( 'Valid string', 'UTF-8' ) ), );
What happens and what is reported when running?
npm run test:php -- --filter Tests_Formatting_wpCheckInvalidUtf8::test_should_return_given_string_when_valid_without_stripping
### Data provider:
..F.... 7 / 7 (100%) Time: 00:00.454, Memory: 191.00 MB There was 1 failure: 1) Tests_Formatting_wpCheckInvalidUtf8::test_should_return_given_string_when_valid_without_stripping with data set "invalid for demonstration purposes" ('�(') Failed asserting that two strings are identical. --- Expected +++ Actual @@ @@ -'�(' +'' /var/www/tests/phpunit/tests/formatting/wpCheckInvalidUtf8.php:40 FAILURES! Tests: 7, Assertions: 7, Failures: 1.
- ✅ All 7 of the datasets ran including the ones after the failed one.
- ✅ Reported which one failed, using the named dataset approach (to more clearly identify it).
### Looping through a datasets array within the test method
F 1 / 1 (100%) Time: 00:00.691, Memory: 191.00 MB There was 1 failure: 1) Tests_Formatting_wpCheckInvalidUtf8::test_should_return_given_string_when_valid_without_stripping Failed asserting that two strings are identical. --- Expected +++ Actual @@ @@ -'�(' +'' /var/www/tests/phpunit/tests/formatting/wpCheckInvalidUtf8.php:51 FAILURES! Tests: 1, Assertions: 3, Failures: 1.
- ❌ Only 3 of the datasets were tested. The remaining 4 datasets were _not_ tested.
- ❌ It's not clear which dataset failed, especially when this test is run with other tests as the number of assertions may not be easily determined.
@hellofromTonya commented on PR #7547:
5 months ago
#10
@Debarghya-Banerjee thanks for updating!
I found this issue yesterday as well, while writing the unit test for
wp_check_invalid_utf_8
with$strip
set totrue
, if we pass an invalid UTF-8 character, it throws an error, this is probably because of this line:
// Attempt to strip the bad chars if requested (not recommended). if ( $strip && function_exists( 'iconv' ) ) { return iconv( 'utf-8', 'utf-8', $text ); }
Please let me know your thoughts on this. Thanks.
Silly me. I forgot about the known issue with iconv()
, which is being tracked in https://core.trac.wordpress.org/ticket/29717.
Currently, when giving an invalid string to inconv()
, it will throw a PHP notice and return false
:
Notice: iconv(): Detected an illegal character in input string in /in/4Ft3C on line 4 bool(false)
See it here in action https://3v4l.org/4Ft3C#veol.
The returning of false
is not documented in the function, as the documented return type is string
. Fixing that is out-of-scope for this PR as it's focused on adding tests for the function.
So how can this path in the function be tested?
The current behavior is a PHP Notice. The test can expectNotice()
and check the message too:
$this->expectNotice( $value );
$this->expectNoticeMessage( 'iconv(): Detected an illegal character in input string' );
@debarghyabanerjee commented on PR #7547:
5 months ago
#11
Hi @hellofromtonya ,
Thanks for the suggestions.
This Looks Good To me. I will be pushing the updated code in sometime.
5 months ago
#12
An opportunity to share why a data provider is preferred
Thanks! Seems it may be a good idea to add a bit of this to the handbook? Perhaps this part:
Using a data provider is preferable because: - All datasets are tested even if one or more of them fail the assertion(s) with the test method. - Debugging is easier. When an assertion fails, the report indicates which dataset(s) failed.
@hellofromTonya commented on PR #7547:
5 months ago
#13
Thanks! Seems it may be a good idea to add a bit of this to the handbook? Perhaps this part:
Using a data provider is preferable because: - All datasets are tested even if one or more of them fail the assertion(s) with the test method. - Debugging is easier. When an assertion fails, the report indicates which dataset(s) failed.
Great idea @azaozz! I was thinking about that today too.
@debarghyabanerjee commented on PR #7547:
5 months ago
#14
Hi @hellofromtonya , sorry for the delayed response, I was a bit away.
I have addressed your feedback and refactored the code. Can you please take a look into it?
Thanks.
Since this is about adding tests and not a general build or test tooling change, I'm reassigning it to the relevant component.