Opened 13 years ago
Closed 13 years ago
#19033 closed defect (bug) (fixed)
Problem with Hebrew letter "Nun" hiding search results
Reported by: | shirgans | Owned by: | |
---|---|---|---|
Milestone: | 3.4 | Priority: | normal |
Severity: | critical | Version: | 3.2.1 |
Component: | I18N | Keywords: | has-patch needs-unit-tests dev-feedback |
Focuses: | Cc: |
Description
In the hebrew installation, when trying to search to website for words with the letter Nun ( נ ), no results found.
There was a related problem on an earlier version of wordpress with the same letter, please see ticket here:
http://core.trac.wordpress.org/ticket/11669
Please try to search on this site, the words (נתן, קושניר, אנטולי) which all has the letter נ . no results found, while all names are appears in the site.
http://www.pat.co.il/shirg/comm-it.co.il/he/
We need fix/patch ASAP. Thank you.
Attachments (1)
Change History (7)
#2
follow-up:
↓ 3
@
13 years ago
- Keywords has-patch needs-unit-tests added; needs-patch removed
- Milestone changed from Awaiting Review to 3.3
Looks like this has to do with \s
in the regexp, similarly to #11528 and [12501].
To reproduce:
preg_match_all('/".*?("|$)|((?<=[\\s",+])|^)[^\\s",+]+/', 'נתן, קושניר, אנטולי', $matches); var_dump($matches);
Here's what I get on PHP 5.2.14 (Windows), PCRE 8.02 2010-03-19:
array(3) { [0]=> array(6) { [0]=> string(1) "�" [1]=> string(4) "תן" [2]=> string(7) "קוש�" [3]=> string(4) "יר" [4]=> string(3) "א�" [5]=> string(8) "טולי" } ... }
With the regexp from the patch:
array(3) { [0]=> array(3) { [0]=> string(6) "נתן" [1]=> string(12) "קושניר" [2]=> string(12) "אנטולי" } ... }
#3
in reply to:
↑ 2
@
13 years ago
Replying to SergeyBiryukov:
Looks like this has to do with
\s
in the regexp, similarly to #11528 and [12501].
Yes, we should be careful not to use \s
in regexp anywhere as it grabs parts of utf-8 chars (not only in Hebrew).
In this case it seems we are looking for word separators in the search string that was entered in a <input type="text"
field. Perhaps \r\n\t
should be stripped completely or even the string should be rejected if any of these are found, then we could use \b
.
There should be many examples of search string sanitization and handling, maybe we should look around a bit. For example chars like !@#$%^&
are usually ignored, etc.
Fornow, I have changed query.php under wp-includes, in order to have a quick hot fix (from line 2171: