Opened 2 years ago

Closed 13 months ago

#17738 closed defect (bug) (fixed)

remove_accents() can't handle Vietnamese vowels

Reported by: tgeorge Owned by: nacin
Priority: normal Milestone: 3.4
Component: Formatting Version: 3.1.3
Severity: normal Keywords: has-patch
Cc: info@…, johnbillion@…, tgeorge

Description

replace_accents() can't handle many of the vowels present in Vietnamese. For the complete list of vowels:

http://en.wikipedia.org/wiki/Vietnamese_alphabet#Tone_marks

Here are the precise vowels that replace_accents() can't handle currently:

ẰằẦầỀềỒồỜờỪừỲỳẢảẲẳẨẩẺẻỂểỈỉỎỏỔổỞởỦủỬửỶỷẴẵẪẫẼẽỄễỖỗỠỡỮữỸỹẮắẤấẾếỐốỚớỨứẠạẶặẬậẸẹỆệỊịỌọỘộỢợỤụỰựỴỵ

And here are those same vowels without accents:

AaAaEeOoOoUuYyAaAaAaEeEeIiOoOoOoUuUuYyAaAaEeEeOoOoUuYyAaAaEeOoOoUuAaAaAaEeEeIiOoOoOoUuUuYy

Attachments (4)

17738.patch (3.8 KB) - added by SergeyBiryukov 21 months ago.
17738.tests.patch (1.5 KB) - added by ampt 20 months ago.
17738.tests.2.patch (1.6 KB) - added by ampt 18 months ago.
Updated tests
17738.tests.3.patch (1.2 KB) - added by SergeyBiryukov 14 months ago.

Download all attachments as: .zip

Change History (21)

  • Summary changed from replace_accents() can't handle Vietnamese vowels to remove_accents() can't handle Vietnamese vowels

I meant "remove_accents()", not "replace_accents()". Sorry! The "remove_accents()" function is defined in formatting.php.

  • Cc info@… added
  • Cc johnbillion@… added
  • Cc tgeorge added

There are four additional vowels that remove_accents() can't handle. I forgot them in my original message:

ƠơƯư

And here are those same vowels without accents:

OoUu

comment:5 follow-up: ↓ 6   SergeyBiryukov21 months ago

  • Keywords has-patch added

I've made a patch, but it's a huge chunk of characters, and I wonder if this should rather be included into Vietnamese package as a filter.

Perhaps remove_accents() needs a filter for this, so that replacements in sanitize_title() could only occur with save context.

comment:6 in reply to: ↑ 5   nacin21 months ago

  • Milestone changed from Awaiting Review to 3.3

Replying to SergeyBiryukov:

Perhaps remove_accents() needs a filter for this, so that replacements in sanitize_title() could only occur with save context.

remove_accents() is already only called there on save context, so this should be good.

I meant hooking into sanitize_title() from wp-content/languages/vi.php.
I missed that context is passed to sanitize_title filter, so that's currently possible too.

  • Keywords needs-unit-tests added

ampt20 months ago

Add unit tests, this patch works on its own, but probably should be incorporated into the tests in #9591

ampt18 months ago

Updated tests

Updated tests to apply to [UT 471]

Before patch: Tests: 12, Assertions: 13, Failures: 6.

With attachment:17738.patch OK (12 tests, 13 assertions)

  • Milestone changed from 3.3 to Future Release

When version is 3.1.3 and the ticket needs-unit-tests it is not going to make 3.3. Punting.

  • Keywords has-unit-tests added; needs-unit-tests removed
  • Keywords needs-unit-tests added; has-unit-tests removed

Per IRC chat, it's better to keep the keyword until the tests are reviewed and committed.

  • Milestone changed from Future Release to 3.4
  • Keywords needs-unit-tests removed
  • Owner set to nacin
  • Resolution set to fixed
  • Status changed from new to closed

In [20687]:

Add Vietnamese vowels to remove_accents(). props SergeyBiryukov. fixes #17738.

Note: See TracTickets for help on using tickets.