﻿id	summary	reporter	owner	description	type	status	priority	milestone	component	version	severity	resolution	keywords	cc
11528	sanitize_text_field() issue with UTF-8 characters	SergeyBiryukov		"{{{sanitize_text_field()}}} is the new function in {{{/wp-includes/formatting.php}}} which sanitizes a string from user input or from the database.

The following line of the function is not fully compatible with UTF-8:
{{{
$filtered = trim( preg_replace('/\s+/', ' ', $filtered) );
}}}
It creates problems with characters like Р (capital Cyrillic R) which can be represented as {{{D0 A0}}} (hexadecimal) in ASCII and becomes {{{D0 20}}} after the replacement. To reproduce the issue, one can try to create a category named оРангутанг or САПР. The rest of the word after Р is not displayed, the slug is incorrect too. If a title starts with Р, it is not displayed at all.

The problem was reported on Russian support forums soon after the release. Currently the filter is included in local files to avoid this replacement, however I think the issue is relevant to other languages using Cyrillic alphabet."	defect (bug)	closed	normal	2.9.1	Formatting	2.9	major	fixed		SergeyBiryukov
