WordPress.org

Make WordPress Core

Opened 3 weeks ago

Last modified 8 days ago

#44296 reviewing defect (bug)

Enable double-width space works as a separator in search query

Reported by: ryotsun Owned by: SergeyBiryukov
Milestone: 5.0 Priority: normal
Severity: normal Version: trunk
Component: Query Keywords: has-patch has-unit-tests
Focuses: Cc:

Description (last modified by SergeyBiryukov)

Related ticket: #43829

It has a bug in search query (search form). It would be recognizing as one word in case of putting double-width space in search query.

So, it should be recognized as a separator like half-width space " ".

And most important thing in this ticket, it can be a clue in order to approach CJK unified ideographs.

cf. ) https://en.wikipedia.org/wiki/CJK_Unified_Ideographs

Attachments (2)

44296_1.patch (1.7 KB) - added by ryotsun 3 weeks ago.
44296_2.patch (1.7 KB) - added by ryotsun 2 weeks ago.

Download all attachments as: .zip

Change History (8)

@ryotsun
3 weeks ago

#1 @SergeyBiryukov
3 weeks ago

  • Description modified (diff)

#2 follow-up: @tenpura
3 weeks ago

The mbstring PHP extension might be unavailable, so mb_convert_kana() needs function_exists() check.

I'm sure that ideographic space is legitimate as a search word separator in Japanese, but I'm not sure that it is treated exactly the same when it comes to search in other CJK or similar languages. If you are 100% certain about it, fine. If not, it might be safer to confirm it to polyglots community to minimize possibilities of breaking something.

This ticket was mentioned in Slack in #polyglots by nao. View the logs.


3 weeks ago

#4 @southp
3 weeks ago

From my understanding of Chinese, there shouldn't be a problem with that. However, we should not convert the quoted ideographic spaces, since it is likely intentional. e.g. '"I am quoted"' should be searching for "I am quoted" instead of "I am quoted".

Last edited 3 weeks ago by southp (previous) (diff)

#5 in reply to: ↑ 2 @ryotsun
2 weeks ago

@tenpura Thank you for your advice.

I should not have used mbstring. So, I've changed codes to use str_replace instead.

Replying to tenpura:

The mbstring PHP extension might be unavailable, so mb_convert_kana() needs function_exists() check.

I'm sure that ideographic space is legitimate as a search word separator in Japanese, but I'm not sure that it is treated exactly the same when it comes to search in other CJK or similar languages. If you are 100% certain about it, fine. If not, it might be safer to confirm it to polyglots community to minimize possibilities of breaking something.

@ryotsun
2 weeks ago

#6 @SergeyBiryukov
8 days ago

  • Milestone changed from Awaiting Review to 5.0
  • Owner set to SergeyBiryukov
  • Status changed from new to reviewing
Note: See TracTickets for help on using tickets.