WordPress.org

Make WordPress Core

Opened 2 months ago

Last modified 44 hours ago

#50924 accepted enhancement

Do not use ZWNJ and ZWJ in post name (slug and url)

Reported by: behzadian Owned by: SergeyBiryukov
Milestone: 5.7 Priority: normal
Severity: normal Version:
Component: Permalinks Keywords: 2nd-opinion has-patch has-unit-tests
Focuses: Cc:

Description

If I use Zero Width Non Joiner char in post title (and post title will be used for post slug and name as default), url will be ugly.
ZWNJ is a character used to split two words without real space, for example I in work (پیامک‌ها) there is a ZWNJ between (پیامک) and (ها). If I remove ZWNJ, it will be "پیامکها" which is wrong in Persian (and similar languages). Now if I use ZWNJ in slug, makes url so ugly, for example "/پیامک%e2%80%8cها/".
I changed function remove_accents( $string ) and add two replacement as below:
`
Grave accent.
'Ǜ' => 'U',
'ǜ' => 'u',
Persian.
'‌' => '-', added
'‍' => '-',
added
`
to replace these chars to space

Attachments (2)

50924.1.diff (648.1 KB) - added by Mista-Flo 2 weeks ago.
Added some unit tests + farsi pack for tests
50924.2.diff (2.7 KB) - added by Mista-Flo 2 weeks ago.
Fix huge po/mo file

Download all attachments as: .zip

Change History (11)

This ticket was mentioned in PR #475 on WordPress/wordpress-develop by behzadian.


2 months ago

  • Keywords has-patch added

I do not know if it is right place or wrong, but I added two lines to remove ZWNJ and ZWJ from url.
Right? because it works
Wrong? because Persian a language not an accent!
I'm not Wordpress developer, so I don't know where to add codes
I created a ticket and explained what I done here (https://core.trac.wordpress.org/ticket/50924)

Trac ticket:

#2 @johnbillion
2 months ago

  • Component changed from I18N to Charset
  • Milestone changed from Awaiting Review to Future Release
  • Type changed from feature request to enhancement
  • Version 5.5 deleted

Welcome @behzadian. Thanks for the ticket and the PR!

This would be better done by using the hexadecimal representation of the Unicode codepoint instead of the actual character in each array key, so it's visible to readers of the code.

I think these are correct but I've not done any testing at all:

  • "\xe2\x80\x8c"
  • "\xe2\x80\x8d"

#3 @SergeyBiryukov
2 months ago

  • Component changed from Charset to Permalinks
  • Milestone changed from Future Release to 5.6
  • Owner set to SergeyBiryukov
  • Status changed from new to accepted

Related: #44958, #47594, #47912.

This ticket was mentioned in Slack in #core by metalandcoffee. View the logs.


2 weeks ago

#5 @Mista-Flo
2 weeks ago

  • Keywords needs-refresh needs-unit-tests added

#6 @hellofromTonya
2 weeks ago

Per Sergey during today's scrub:

Will try to address before beta, or punt otherwise.

@Mista-Flo
2 weeks ago

Added some unit tests + farsi pack for tests

#7 @Mista-Flo
2 weeks ago

  • Keywords has-unit-tests added; needs-refresh needs-unit-tests removed

Hi there,

I tried to drop a new patch for this issue.

The thing is, I had to add the persian translation in the test suite in order to properly switch to fa_IR locale. After that the RTL was ok and I could test.

That said, I tried to implement both @behzadian and @johnbillion suggestion, I'm not sure about the last one, but the unit tests work.

@Mista-Flo
2 weeks ago

Fix huge po/mo file

This ticket was mentioned in Slack in #core by helen. View the logs.


45 hours ago

#9 @SergeyBiryukov
44 hours ago

  • Milestone changed from 5.6 to 5.7

Didn't get a chance to review this in time for 5.6 Beta 1, moving to the next release for now.

Note: See TracTickets for help on using tickets.