Context Navigation

← Previous Ticket
Next Ticket →

#19033 closed defect (bug) (fixed)

Problem with Hebrew letter "Nun" hiding search results

Reported by:	shirgans	Owned by:	shir.gans@…
Milestone:	3.4	Priority:	normal
Severity:	critical	Version:	3.2.1
Component:	I18N	Keywords:	has-patch needs-unit-tests dev-feedback
Focuses:		Cc:

Description

In the hebrew installation, when trying to search to website for words with the letter Nun ( נ ), no results found.

There was a related problem on an earlier version of wordpress with the same letter, please see ticket here:
http://core.trac.wordpress.org/ticket/11669

Please try to search on this site, the words (נתן, קושניר, אנטולי) which all has the letter נ . no results found, while all names are appears in the site.
http://www.pat.co.il/shirg/comm-it.co.il/he/

We need fix/patch ASAP. Thank you.

Attachments (1)

19033.patch (592 bytes) - added by SergeyBiryukov 14 years ago.

Download all attachments as: .zip

Change History (7)

#1 @shirgans
15 years ago

Fornow, I have changed query.php under wp-includes, in order to have a quick hot fix (from line 2171:

if ( !empty($q['sentence']) ) {
				$q['search_terms'] = array($q['s']);
			} else {
			   
               if (strstr($q['s'], 'נ')) $q['search_terms'] = array($q['s']); 
               else {
				preg_match_all('/".*?("|$)|((?<=[\\s",+])|^)[^\\s",+]+/', $q['s'], $matches);
				$q['search_terms'] = array_map('_search_terms_tidy', $matches[0]);
                }
			}

@SergeyBiryukov
14 years ago

Attachment 19033.patch added

#2 follow-up: ↓ 3 @SergeyBiryukov
14 years ago

Keywords has-patch needs-unit-tests added; needs-patch removed
Milestone changed from Awaiting Review to 3.3

Looks like this has to do with \s in the regexp, similarly to #11528 and [12501].

To reproduce:

preg_match_all('/".*?("|$)|((?<=[\\s",+])|^)[^\\s",+]+/', 'נתן, קושניר, אנטולי', $matches);
var_dump($matches);

Here's what I get on PHP 5.2.14 (Windows), PCRE 8.02 2010-03-19:

array(3) {
  [0]=>
  array(6) {
    [0]=>
    string(1) "�"
    [1]=>
    string(4) "תן"
    [2]=>
    string(7) "קוש�"
    [3]=>
    string(4) "יר"
    [4]=>
    string(3) "א�"
    [5]=>
    string(8) "טולי"
  }
  ...
}

With the regexp from the patch:

array(3) {
  [0]=>
  array(3) {
    [0]=>
    string(6) "נתן"
    [1]=>
    string(12) "קושניר"
    [2]=>
    string(12) "אנטולי"
  }
  ...
}

#3 in reply to: ↑ 2 @azaozz
14 years ago

Replying to SergeyBiryukov:

Looks like this has to do with \s in the regexp, similarly to #11528 and [12501].

Yes, we should be careful not to use \s in regexp anywhere as it grabs parts of utf-8 chars (not only in Hebrew).

In this case it seems we are looking for word separators in the search string that was entered in a <input type="text" field. Perhaps \r\n\t should be stripped completely or even the string should be rejected if any of these are found, then we could use \b.

There should be many examples of search string sanitization and handling, maybe we should look around a bit. For example chars like !@#$%^& are usually ignored, etc.

#4 @nacin
14 years ago

Keywords dev-feedback added
Milestone changed from 3.3 to Future Release

#5 @SergeyBiryukov
14 years ago

Component changed from Charset to I18N
Milestone changed from Future Release to 3.4

#6 @nacin
14 years ago

Resolution set to fixed
Status changed from new to closed

In [19866]:

Use [\r\n\t ], not [\s], to prevent issues with some UTF-8 characters. props SergeyBiryukov, fixes #19033.

Note: See TracTickets for help on using tickets.

Trac UI Preferences

Download in other formats:

Make WordPress Core

Context Navigation

#19033 closed defect (bug) (fixed)

Problem with Hebrew letter "Nun" hiding search results

Description

Attachments (1)

Change History (7)

#1 @shirgans 15 years ago

@SergeyBiryukov 14 years ago

#2 follow-up: ↓ 3 @SergeyBiryukov 14 years ago

#3 in reply to: ↑ 2 @azaozz 14 years ago

#4 @nacin 14 years ago

#5 @SergeyBiryukov 14 years ago

#6 @nacin 14 years ago