sanitize_text_field() issue with UTF-8 characters
|Reported by:||SergeyBiryukov||Owned by:|
sanitize_text_field() is the new function in /wp-includes/formatting.php which sanitizes a string from user input or from the database.
The following line of the function is not fully compatible with UTF-8:
$filtered = trim( preg_replace('/\s+/', ' ', $filtered) );
It creates problems with characters like Р (capital Cyrillic R) which can be represented as D0 A0 (hexadecimal) in ASCII and becomes D0 20 after the replacement. To reproduce the issue, one can try to create a category named оРангутанг or САПР. The rest of the word after Р is not displayed, the slug is incorrect too. If a title starts with Р, it is not displayed at all.
The problem was reported on Russian support forums soon after the release. Currently the filter is included in local files to avoid this replacement, however I think the issue is relevant to other languages using Cyrillic alphabet.
Change History (12)
comment:10 hakre — 4 years ago
- Milestone changed from 2.9.1 to 2.9.2
- Resolution fixed deleted
- Status changed from closed to reopened
comment:11 westi — 4 years ago
- Milestone changed from 2.9.2 to 2.9.1
- Resolution set to fixed
- Status changed from reopened to closed