﻿id,summary,reporter,owner,description,type,status,priority,milestone,component,version,severity,resolution,keywords,cc
20368,htmlspecialchars() returns empty string for non-UTF-8 input in PHP 5.4,convissor,,"The default value of the input `$encoding` parameter for `htmlspecialchars()` changed to UTF-8 in PHP 5.4.  The prior default was ISO-8859-1.  The function's UTF-8 handler checks the input, returning an empty string if the input isn't valid UTF-8.

WordPress will see the UTF-8 validator kicking because most of the `htmlspecialchars()` calls don't use the `$encoding` parameter.  This will cause major problems for sites that have a `DB_CHARSET` other than `utf8`.

[http://article.gmane.org/gmane.comp.php.devel/71783 Posting 58859 to php-internals] by Rasmus gives a clear example of the problem.  Here is a link to [http://thread.gmane.org/gmane.comp.php.devel/71777 view the whole thread], starting with posting 58853).

Creating two centralized functions is an approach for resolving this problem.  This route is simpler and easier to maintain than adding the parameters to each `htmlspecialchars()` call throughout the code base.

1. `wp_hsc_db()` for safely displaying database results.  Uses `DB_CHARSET` to calculate the appropriate `$encoding` parameter.  MySQL's character set names are not equivalent to the values PHP is looking for in the `$encoding` parameter.  Please see the `hsc_db()` method in the [http://plugins.svn.wordpress.org/login-security-solution/trunk/login-security-solution.php Login Security Solution plugin] for a mapping of the valid options.

2. `wp_hsc_utf8()` for safely displaying strings known to be saved as UTF-8, such as error messages written in core.  Uses `UTF-8` as the `$encoding` parameter.  

Some calls in core use the `$flags` parameter, so these new functions will need the parameter too.  The default should be `ENT_COMPAT`, which works under PHP 5.2, 5.3 and 5.4.

It may be suggested that WP use `htmlspecialchar()`'s auto-detection option (by passing an empty string to the `$encoding` parameter).  This is not advisable because it can produce inconsistent behavior.  Even the PHP manual says this route is not recommended.",defect (bug),new,normal,Awaiting Review,General,,major,,,info@… kpayne@… jkudish
