Changes between Version 1 and Version 2 of Ticket #25585, comment 10
- Timestamp:
- 10/15/2013 08:45:38 PM (11 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
Ticket #25585, comment 10
v1 v2 3 3 Yeah, it can be moved to the proposed filter so plugins could change the pattern for specific languages. 4 4 5 The idea is to remove single letter terms from the search. The pattern `/^\p{L}$/u` is the safest way to match a single letter in any language. It's not particularly fast as it looks through the Unicode character properties. A better (but quite slower) pattern could be `/^\p{L}\p{M}*|\p{Z}|\p{P}|\p{C}$/u` which also matches separators (any kind of whitespace or invisible separators), punctuation, and invisible control characters and unused code points .5 The idea is to remove single letter terms from the search. The pattern `/^\p{L}$/u` is the safest way to match a single letter in any language. It's not particularly fast as it looks through the Unicode character properties. A better (but quite slower) pattern could be `/^\p{L}\p{M}*|\p{Z}|\p{P}|\p{C}$/u` which also matches separators (any kind of whitespace or invisible separators), punctuation, and invisible control characters and unused code points ([http://www.regular-expressions.info/unicode.html#category more info]). 6 6 7 7 `search_terms_count` is the count before the terms were cleaned. It's used to determine if the sorting would use CASE and match in both title and content, or just a sentence match. This part of parse_search_order() has gone through quite a few changes, maybe there is a simpler way to do that now.