htmlspecialchars() returns empty string for non-UTF-8 input in PHP 5.4
|Reported by:||convissor||Owned by:|
|Cc:||info@…, kpayne@…, jkudish|
The default value of the input $encoding parameter for htmlspecialchars() changed to UTF-8 in PHP 5.4. The prior default was ISO-8859-1. The function's UTF-8 handler checks the input, returning an empty string if the input isn't valid UTF-8.
WordPress will see the UTF-8 validator kicking because most of the htmlspecialchars() calls don't use the $encoding parameter. This will cause major problems for sites that have a DB_CHARSET other than utf8.
Creating two centralized functions is an approach for resolving this problem. This route is simpler and easier to maintain than adding the parameters to each htmlspecialchars() call throughout the code base.
- wp_hsc_db() for safely displaying database results. Uses DB_CHARSET to calculate the appropriate $encoding parameter. MySQL's character set names are not equivalent to the values PHP is looking for in the $encoding parameter. Please see the hsc_db() method in the Login Security Solution plugin for a mapping of the valid options.
- wp_hsc_utf8() for safely displaying strings known to be saved as UTF-8, such as error messages written in core. Uses UTF-8 as the $encoding parameter.
Some calls in core use the $flags parameter, so these new functions will need the parameter too. The default should be ENT_COMPAT, which works under PHP 5.2, 5.3 and 5.4.
It may be suggested that WP use htmlspecialchar()'s auto-detection option (by passing an empty string to the $encoding parameter). This is not advisable because it can produce inconsistent behavior. Even the PHP manual says this route is not recommended.