WordPress.org

Make WordPress Core

Opened 4 years ago

Closed 3 years ago

#34677 closed enhancement (fixed)

Inline comments for remove_accents()

Reported by: John_Schlick Owned by: DrewAPicture
Milestone: 4.6 Priority: normal
Severity: normal Version: 4.4
Component: Formatting Keywords: has-patch has-screenshots
Focuses: docs Cc:
PR Number:

Description (last modified by DrewAPicture)

I was given a copy of a single function from this file to use, and I have added comments to it.
In: https://core.trac.wordpress.org/browser/trunk/src/wp-includes/formatting.php#L1128

in the remove_accents function, you can use this for the chars array:
(it's the same contents exactly just in an ever so slightly different order, and with super awesome comments mostly added. (Feel free to email me or call me at [removed] when you go to add this, I may have a more commented version by then.)

Attachments (3)

34677.diff (30.3 KB) - added by DrewAPicture 3 years ago.
34677.2.diff (30.1 KB) - added by DrewAPicture 3 years ago.
34677_output.2.png (1.6 MB) - added by DrewAPicture 3 years ago.

Download all attachments as: .zip

Change History (16)

#1 @SergeyBiryukov
4 years ago

  • Component changed from General to Formatting
  • Focuses docs added
  • Summary changed from enhancement to /src/wp-includes/formatting.php to Inline comments for remove_accents()

#2 @John_Schlick
4 years ago

Here is the completed version with ALL the comments, and the characters in unicode order with full unicode names:

<?php
                        $chars = array(
                                // U+00A3 | £ | POUND SIGN <- why is this eliminated?
                                chr(194).chr(163) => '',
                        // Decompositions for Latin-1 Supplement
                                // U+00AA | ª | FEMININE ORDINAL INDICATOR
                                chr(194).chr(170) => 'a',
                                // U+00BA | º | MASCULINE ORDINAL INDICATOR
                                chr(194).chr(186) => 'o',
                                // U+00C0 | À | LATIN CAPITAL LETTER A WITH GRAVE
                                chr(195).chr(128) => 'A',
                                // U+00C1 | Á | LATIN CAPITAL LETTER A WITH ACUTE
                                chr(195).chr(129) => 'A',
                                // U+00C2 | Â | LATIN CAPITAL LETTER A WITH CIRCUMFLEX
                                chr(195).chr(130) => 'A',
                                // U+00C3 | Ã | LATIN CAPITAL LETTER A WITH TILDE
                                chr(195).chr(131) => 'A',
                                // U+00C4 | Ä | LATIN CAPITAL LETTER A WITH DIAERESIS
                                chr(195).chr(132) => 'A',
                                // U+00C5 | Å | LATIN CAPITAL LETTER A WITH RING ABOVE
                                chr(195).chr(133) => 'A',
                                // U+00C6 | Æ | LATIN CAPITAL LETTER AE
                                chr(195).chr(134) => 'AE',
                                // U+00C7 | Ç | LATIN CAPITAL LETTER C WITH CEDILLA
                                chr(195).chr(135) => 'C',
                                // U+00C8 | È | LATIN CAPITAL LETTER E WITH GRAVE
                                chr(195).chr(136) => 'E',
                                // U+00C9 | É | LATIN CAPITAL LETTER E WITH ACUTE
                                chr(195).chr(137) => 'E',
                                // U+00CA | Ê | LATIN CAPITAL LETTER E WITH CIRCUMFLEX
                                chr(195).chr(138) => 'E',
                                // U+00CB | Ë | LATIN CAPITAL LETTER E WITH DIAERESIS
                                chr(195).chr(139) => 'E',
                                // U+00CC | Ì | LATIN CAPITAL LETTER I WITH GRAVE
                                chr(195).chr(140) => 'I',
                                // U+00CD | Í | LATIN CAPITAL LETTER I WITH ACUTE
                                chr(195).chr(141) => 'I',
                                // U+00CE | Î | LATIN CAPITAL LETTER I WITH CIRCUMFLEX
                                chr(195).chr(142) => 'I',
                                // U+00CF | Ï | LATIN CAPITAL LETTER I WITH DIAERESIS
                                chr(195).chr(143) => 'I',
                                // U+00D0 | Ð | LATIN CAPITAL LETTER ETH
                                chr(195).chr(144) => 'D',
                                // U+00D1 | Ñ | LATIN CAPITAL LETTER N WITH TILDE
                                chr(195).chr(145) => 'N',
                                // U+00D2 | Ò | LATIN CAPITAL LETTER O WITH GRAVE
                                chr(195).chr(146) => 'O',
                                // U+00D3 | Ó | LATIN CAPITAL LETTER O WITH ACUTE
                                chr(195).chr(147) => 'O',
                                // U+00D4 | Ô | LATIN CAPITAL LETTER O WITH CIRCUMFLEX
                                chr(195).chr(148) => 'O',
                                // U+00D5 | Õ | LATIN CAPITAL LETTER O WITH TILDE
                                chr(195).chr(149) => 'O',
                                // U+00D6 | Ö | LATIN CAPITAL LETTER O WITH DIAERESIS
                                chr(195).chr(150) => 'O',
                                // U+00D8 | Ø | LATIN CAPITAL LETTER O WITH STROKE
                                chr(195).chr(152) => 'O',
                                // U+00D9 | Ù | LATIN CAPITAL LETTER U WITH GRAVE
                                chr(195).chr(153) => 'U',
                                // U+00DA | Ú | LATIN CAPITAL LETTER U WITH ACUTE
                                chr(195).chr(154) => 'U',
                                // U+00DB | Û | LATIN CAPITAL LETTER U WITH CIRCUMFLEX
                                chr(195).chr(155) => 'U',
                                // U+00DC | Ü | LATIN CAPITAL LETTER U WITH DIAERESIS
                                chr(195).chr(156) => 'U',
                                // U+00DD | Ý | LATIN CAPITAL LETTER Y WITH ACUTE
                                chr(195).chr(157) => 'Y',
                                // U+00DE | Þ | LATIN CAPITAL LETTER THORN
                                chr(195).chr(158) => 'TH',
                                // U+00DF | ß | LATIN SMALL LETTER SHARP S
                                chr(195).chr(159) => 's',
                                // U+00E0 | à | LATIN SMALL LETTER A WITH GRAVE
                                chr(195).chr(160) => 'a',
                                // U+00E1 | á | LATIN SMALL LETTER A WITH ACUTE
                                chr(195).chr(161) => 'a',
                                // U+00E2 | â | LATIN SMALL LETTER A WITH CIRCUMFLEX
                                chr(195).chr(162) => 'a',
                                // U+00E3 | ã | LATIN SMALL LETTER A WITH TILDE
                                chr(195).chr(163) => 'a',
                                // U+00E4 | ä | LATIN SMALL LETTER A WITH DIAERESIS
                                chr(195).chr(164) => 'a',
                                // U+00E5 | å | LATIN SMALL LETTER A WITH RING ABOVE
                                chr(195).chr(165) => 'a',
                                // U+00E6 | æ | LATIN SMALL LETTER AE
                                chr(195).chr(166) => 'ae',
                                // U+00E7 | ç | LATIN SMALL LETTER C WITH CEDILLA
                                chr(195).chr(167) => 'c',
                                // U+00E8 | è | LATIN SMALL LETTER E WITH GRAVE
                                chr(195).chr(168) => 'e',
                                // U+00E9 | é | LATIN SMALL LETTER E WITH ACUTE
                                chr(195).chr(169) => 'e',
                                // U+00EA | ê | LATIN SMALL LETTER E WITH CIRCUMFLEX
                                chr(195).chr(170) => 'e',
                                // U+00EB | ë | LATIN SMALL LETTER E WITH DIAERESIS
                                chr(195).chr(171) => 'e',
                                // U+00EC | ì | LATIN SMALL LETTER I WITH GRAVE
                                chr(195).chr(172) => 'i',
                                // U+00ED | í | LATIN SMALL LETTER I WITH ACUTE
                                chr(195).chr(173) => 'i',
                                // U+00EE | î | LATIN SMALL LETTER I WITH CIRCUMFLEX
                                chr(195).chr(174) => 'i',
                                // U+00EF | ï | LATIN SMALL LETTER I WITH DIAERESIS
                                chr(195).chr(175) => 'i',
                                // U+00F0 | ð | LATIN SMALL LETTER ETH
                                chr(195).chr(176) => 'd',
                                // U+00F1 | ñ | LATIN SMALL LETTER N WITH TILDE
                                chr(195).chr(177) => 'n',
                                // U+00F2 | ò | LATIN SMALL LETTER O WITH GRAVE
                                chr(195).chr(178) => 'o',
                                // U+00F3 | ó | LATIN SMALL LETTER O WITH ACUTE
                                chr(195).chr(179) => 'o',
                                // U+00F4 | ô | LATIN SMALL LETTER O WITH CIRCUMFLEX
                                chr(195).chr(180) => 'o',
                                // U+00F5 | õ | LATIN SMALL LETTER O WITH TILDE
                                chr(195).chr(181) => 'o',
                                // U+00F6 | ö | LATIN SMALL LETTER O WITH DIAERESIS
                                chr(195).chr(182) => 'o',
                                // U+00F8 | ø | LATIN SMALL LETTER O WITH STROKE
                                chr(195).chr(184) => 'o',
                                // U+00F9 | ù | LATIN SMALL LETTER U WITH GRAVE
                                chr(195).chr(185) => 'u',
                                // U+00FA | ú | LATIN SMALL LETTER U WITH ACUTE
                                chr(195).chr(186) => 'u',
                                // U+00FB | û | LATIN SMALL LETTER U WITH CIRCUMFLEX
                                chr(195).chr(187) => 'u',
                                // U+00FC | ü | LATIN SMALL LETTER U WITH DIAERESIS
                                chr(195).chr(188) => 'u',
                                // U+00FD | ý | LATIN SMALL LETTER Y WITH ACUTE
                                chr(195).chr(189) => 'y',
                                // U+00FE | þ | LATIN SMALL LETTER THORN
                                chr(195).chr(190) => 'th',
                                // U+00FF | ÿ | LATIN SMALL LETTER Y WITH DIAERESIS
                                chr(195).chr(191) => 'y',
                        // Decompositions for Latin Extended-A
                                // U+0100 | Ā | LATIN CAPITAL LETTER A WITH MACRON
                                chr(196).chr(128) => 'A',
                                // U+0101 | ā | LATIN SMALL LETTER A WITH MACRON
                                chr(196).chr(129) => 'a',
                                // U+0102 | Ă | LATIN CAPITAL LETTER A WITH BREVE
                                chr(196).chr(130) => 'A',
                                // U+0103 | ă | LATIN SMALL LETTER A WITH BREVE
                                chr(196).chr(131) => 'a',
                                // U+0104 | Ą | LATIN CAPITAL LETTER A WITH OGONEK
                                chr(196).chr(132) => 'A',
                                // U+0105 | ą | LATIN SMALL LETTER A WITH OGONEK
                                chr(196).chr(133) => 'a',
                                // U+01006 | Ć | LATIN CAPITAL LETTER C WITH ACUTE
                                chr(196).chr(134) => 'C',
                                // U+0107 | ć | LATIN SMALL LETTER C WITH ACUTE
                                chr(196).chr(135) => 'c',
                                // U+0108 | Ĉ | LATIN CAPITAL LETTER C WITH CIRCUMFLEX
                                chr(196).chr(136) => 'C',
                                // U+0109 | ĉ | LATIN SMALL LETTER C WITH CIRCUMFLEX
                                chr(196).chr(137) => 'c',
                                // U+010A | Ċ | LATIN CAPITAL LETTER C WITH DOT ABOVE
                                chr(196).chr(138) => 'C',
                                // U+010B | ċ | LATIN SMALL LETTER C WITH DOT ABOVE
                                chr(196).chr(139) => 'c',
                                // U+010C | Č | LATIN CAPITAL LETTER C WITH CARON
                                chr(196).chr(140) => 'C',
                                // U+010D | č | LATIN SMALL LETTER C WITH CARON
                                chr(196).chr(141) => 'c',
                                // U+010E | Ď | LATIN CAPITAL LETTER D WITH CARON
                                chr(196).chr(142) => 'D',
                                // U+010F | ď | LATIN SMALL LETTER D WITH CARON
                                chr(196).chr(143) => 'd',
                                // U+0110 | Đ | LATIN CAPITAL LETTER D WITH STROKE
                                chr(196).chr(144) => 'D',
                                // U+0111 | đ | LATIN SMALL LETTER D WITH STROKE
                                chr(196).chr(145) => 'd',
                                // U+0112 | Ē | LATIN CAPITAL LETTER E WITH MACRON
                                chr(196).chr(146) => 'E',
                                // U+0113 | ē | LATIN SMALL LETTER E WITH MACRON
                                chr(196).chr(147) => 'e',
                                // U+0114 | Ĕ | LATIN CAPITAL LETTER E WITH BREVE
                                chr(196).chr(148) => 'E',
                                // U+0115 | ĕ | LATIN SMALL LETTER E WITH BREVE
                                chr(196).chr(149) => 'e',
                                // U+0116 | Ė | LATIN CAPITAL LETTER E WITH DOT ABOVE
                                chr(196).chr(150) => 'E',
                                // U+0117 | ė | LATIN SMALL LETTER E WITH DOT ABOVE
                                chr(196).chr(151) => 'e',
                                // U+0118 | Ę | LATIN CAPITAL LETTER E WITH OGONEK
                                chr(196).chr(152) => 'E',
                                // U+0119 | ę | LATIN SMALL LETTER E WITH OGONEK
                                chr(196).chr(153) => 'e',
                                // U+011A | Ě | LATIN CAPITAL LETTER E WITH CARON
                                chr(196).chr(154) => 'E',
                                // U+011B | ě | LATIN SMALL LETTER E WITH CARON
                                chr(196).chr(155) => 'e',
                                // U+011C | Ĝ | LATIN CAPITAL LETTER G WITH CIRCUMFLEX
                                chr(196).chr(156) => 'G',
                                // U+011D | ĝ | LATIN SMALL LETTER G WITH CIRCUMFLEX
                                chr(196).chr(157) => 'g',
                                // U+011E | Ğ | LATIN CAPITAL LETTER G WITH BREVE
                                chr(196).chr(158) => 'G',
                                // U+011F | ğ | LATIN SMALL LETTER G WITH BREVE
                                chr(196).chr(159) => 'g',
                                // U+0120 | Ġ | LATIN CAPITAL LETTER G WITH DOT ABOVE
                                chr(196).chr(160) => 'G',
                                // U+0121 | ġ | LATIN SMALL LETTER G WITH DOT ABOVE
                                chr(196).chr(161) => 'g',
                                // U+0122 | Ģ | LATIN CAPITAL LETTER G WITH CEDILLA
                                chr(196).chr(162) => 'G',
                                // U+0123 | ģ | LATIN SMALL LETTER G WITH CEDILLA
                                chr(196).chr(163) => 'g',
                                // U+0124 | Ĥ | LATIN CAPITAL LETTER H WITH CIRCUMFLEX
                                chr(196).chr(164) => 'H',
                                // U+0125 | ĥ | LATIN SMALL LETTER H WITH CIRCUMFLEX
                                chr(196).chr(165) => 'h',
                                // U+0126 | Ħ | LATIN CAPITAL LETTER H WITH STROKE
                                chr(196).chr(166) => 'H',
                                // U+0127 | ħ | LATIN SMALL LETTER H WITH STROKE
                                chr(196).chr(167) => 'h',
                                // U+0128 | Ĩ | LATIN CAPITAL LETTER I WITH TILDE
                                chr(196).chr(168) => 'I',
                                // U+0129 | ĩ | LATIN SMALL LETTER I WITH TILDE
                                chr(196).chr(169) => 'i',
                                // U+012A | Ī | LATIN CAPITAL LETTER I WITH MACRON
                                chr(196).chr(170) => 'I',
                                // U+012B | ī | LATIN SMALL LETTER I WITH MACRON
                                chr(196).chr(171) => 'i',
                                // U+012C | Ĭ | LATIN CAPITAL LETTER I WITH BREVE
                                chr(196).chr(172) => 'I',
                                // U+012D | ĭ | LATIN SMALL LETTER I WITH BREVE
                                chr(196).chr(173) => 'i',
                                // U+012E | Į | LATIN CAPITAL LETTER I WITH OGONEK
                                chr(196).chr(174) => 'I',
                                // U+012F | į | LATIN SMALL LETTER I WITH OGONEK
                                chr(196).chr(175) => 'i',
                                // U+0130 | İ | LATIN CAPITAL LETTER I WITH DOT ABOVE
                                chr(196).chr(176) => 'I',
                                // U+0131 | ı | LATIN SMALL LETTER DOTLESS I
                                chr(196).chr(177) => 'i',
                                // U+0132 | IJ | LATIN CAPITAL LIGATURE IJ
                                chr(196).chr(178) => 'IJ',
                                // U+0133 | ij | LATIN SMALL LIGATURE IJ
                                chr(196).chr(179) => 'ij',
                                // U+0134 | Ĵ | LATIN CAPITAL LETTER J WITH CIRCUMFLEX
                                chr(196).chr(180) => 'J',
                                // U+0135 | ĵ | LATIN SMALL LETTER J WITH CIRCUMFLEX
                                chr(196).chr(181) => 'j',
                                // U+0136 | Ķ | LATIN CAPITAL LETTER K WITH CEDILLA
                                chr(196).chr(182) => 'K',
                                // U+0137 | ķ | LATIN SMALL LETTER K WITH CEDILLA
                                chr(196).chr(183) => 'k',
                                // U+0138 | ĸ | LATIN SMALL LETTER KRA
                                chr(196).chr(184) => 'k',
                                // U+0139 | Ĺ | LATIN CAPITAL LETTER L WITH ACUTE
                                chr(196).chr(185) => 'L',
                                // U+013A | ĺ | LATIN SMALL LETTER L WITH ACUTE
                                chr(196).chr(186) => 'l',
                                // U+013B | Ļ | LATIN CAPITAL LETTER L WITH CEDILLA
                                chr(196).chr(187) => 'L',
                                // U+013C | ļ | LATIN SMALL LETTER L WITH CEDILLA
                                chr(196).chr(188) => 'l',
                                // U+013D | Ľ | LATIN CAPITAL LETTER L WITH CARON
                                chr(196).chr(189) => 'L',
                                // U+013E | ľ | LATIN SMALL LETTER L WITH CARON
                                chr(196).chr(190) => 'l',
                                // U+013F | Ŀ | LATIN CAPITAL LETTER L WITH MIDDLE DOT
                                chr(196).chr(191) => 'L',
                                // U+0140 | ŀ | LATIN SMALL LETTER L WITH MIDDLE DOT
                                chr(197).chr(128) => 'l',
                                // U+0141 | Ł | LATIN CAPITAL LETTER L WITH STROKE
                                chr(197).chr(129) => 'L',
                                // U+0142 | ł | LATIN SMALL LETTER L WITH STROKE
                                chr(197).chr(130) => 'l',
                                // U+0143 | Ń | LATIN CAPITAL LETTER N WITH ACUTE
                                chr(197).chr(131) => 'N',
                                // U+0144 | ń | LATIN SMALL LETTER N WITH ACUTE
                                chr(197).chr(132) => 'n',
                                // U+0145 | Ņ | LATIN CAPITAL LETTER N WITH CEDILLA
                                chr(197).chr(133) => 'N',
                                // U+0146 | ņ | LATIN SMALL LETTER N WITH CEDILLA
                                chr(197).chr(134) => 'n',
                                // U+0147 | Ň | LATIN CAPITAL LETTER N WITH CARON
                                chr(197).chr(135) => 'N',
                                // U+0148 | ň | LATIN SMALL LETTER N WITH CARON
                                chr(197).chr(136) => 'n',
                                // U+0149 | ʼn | LATIN SMALL LETTER N PRECEDED BY APOSTROPHE
                                chr(197).chr(137) => 'N',
                                // U+014A | Ŋ | LATIN CAPITAL LETTER ENG
                                chr(197).chr(138) => 'n',
                                // U+014B | ŋ | LATIN SMALL LETTER ENG
                                chr(197).chr(139) => 'N',
                                // U+014C | Ō | LATIN CAPITAL LETTER O WITH MACRON
                                chr(197).chr(140) => 'O',
                                // U+014D | ō | LATIN SMALL LETTER O WITH MACRON
                                chr(197).chr(141) => 'o',
                                // U+014E | Ŏ | LATIN CAPITAL LETTER O WITH BREVE
                                chr(197).chr(142) => 'O',
                                // U+014F | ŏ | LATIN SMALL LETTER O WITH BREVE
                                chr(197).chr(143) => 'o',
                                // U+0150 | Ő | LATIN CAPITAL LETTER O WITH DOUBLE ACUTE
                                chr(197).chr(144) => 'O',
                                // U+0151 | ő | LATIN SMALL LETTER O WITH DOUBLE ACUTE
                                chr(197).chr(145) => 'o',
                                // U+0152 | Π| LATIN CAPITAL LIGATURE OE
                                chr(197).chr(146) => 'OE',
                                // U+0153 | œ | LATIN SMALL LIGATURE OE
                                chr(197).chr(147) => 'oe',
                                // U+0154 | Ŕ | LATIN CAPITAL LETTER R WITH ACUTE
                                chr(197).chr(148) => 'R',
                                // U+0155 | ŕ | LATIN SMALL LETTER R WITH ACUTE
                                chr(197).chr(149) => 'r',
                                // U+0156 | Ŗ | LATIN CAPITAL LETTER R WITH CEDILLA
                                chr(197).chr(150) => 'R',
                                // U+0157 | ŗ | LATIN SMALL LETTER R WITH CEDILLA
                                chr(197).chr(151) => 'r',
                                // U+0158 | Ř | LATIN CAPITAL LETTER R WITH CARON
                                chr(197).chr(152) => 'R',
                                // U+0159 | ř | LATIN SMALL LETTER R WITH CARON
                                chr(197).chr(153) => 'r',
                                // U+015A | Ś | LATIN CAPITAL LETTER S WITH ACUTE
                                chr(197).chr(154) => 'S',
                                // U+015B | ś | LATIN SMALL LETTER S WITH ACUTE
                                chr(197).chr(155) => 's',
                                // U+015C | Ŝ | LATIN CAPITAL LETTER S WITH CIRCUMFLEX
                                chr(197).chr(156) => 'S',
                                // U+015D | ŝ | LATIN SMALL LETTER S WITH CIRCUMFLEX
                                chr(197).chr(157) => 's',
                                // U+015E | Ş | LATIN CAPITAL LETTER S WITH CEDILLA
                                chr(197).chr(158) => 'S',
                                // U+015F | ş | LATIN SMALL LETTER S WITH CEDILLA
                                chr(197).chr(159) => 's',
                                // U+0160 | Š | LATIN CAPITAL LETTER S WITH CARON
                                chr(197).chr(160) => 'S',
                                // U+0161 | š | LATIN SMALL LETTER S WITH CARON
                                chr(197).chr(161) => 's',
                                // U+0162 | Ţ | LATIN CAPITAL LETTER T WITH CEDILLA
                                chr(197).chr(162) => 'T',
                                // U+0163 | ţ | LATIN SMALL LETTER T WITH CEDILLA
                                chr(197).chr(163) => 't',
                                // U+0164 | Ť | LATIN CAPITAL LETTER T WITH CARON
                                chr(197).chr(164) => 'T',
                                // U+0165 | ť | LATIN SMALL LETTER T WITH CARON
                                chr(197).chr(165) => 't',
                                // U+0166 | Ŧ | LATIN CAPITAL LETTER T WITH STROKE
                                chr(197).chr(166) => 'T',
                                // U+0167 | ŧ | LATIN SMALL LETTER T WITH STROKE
                                chr(197).chr(167) => 't',
                                // U+0168 | Ũ | LATIN CAPITAL LETTER U WITH TILDE
                                chr(197).chr(168) => 'U',
                                // U+0169 | ũ | LATIN SMALL LETTER U WITH TILDE
                                chr(197).chr(169) => 'u',
                                // U+016A | Ū | LATIN CAPITAL LETTER U WITH MACRON
                                chr(197).chr(170) => 'U',
                                // U+016B | ū | LATIN SMALL LETTER U WITH MACRON
                                chr(197).chr(171) => 'u',
                                // U+016C | Ŭ | LATIN CAPITAL LETTER U WITH BREVE
                                chr(197).chr(172) => 'U',
                                // U+016D | ŭ | LATIN SMALL LETTER U WITH BREVE
                                chr(197).chr(173) => 'u',
                                // U+016E | Ů | LATIN CAPITAL LETTER U WITH RING ABOVE
                                chr(197).chr(174) => 'U',
                                // U+016F | ů | LATIN SMALL LETTER U WITH RING ABOVE
                                chr(197).chr(175) => 'u',
                                // U+0170 | Ű | LATIN CAPITAL LETTER U WITH DOUBLE ACUTE
                                chr(197).chr(176) => 'U',
                                // U+0171 | ű | LATIN SMALL LETTER U WITH DOUBLE ACUTE
                                chr(197).chr(177) => 'u',
                                // U+0172 | Ų | LATIN CAPITAL LETTER U WITH OGONEK
                                chr(197).chr(178) => 'U',
                                // U+0173 | ų | LATIN SMALL LETTER U WITH OGONEK
                                chr(197).chr(179) => 'u',
                                // U+0174 | Ŵ | LATIN CAPITAL LETTER W WITH CIRCUMFLEX
                                chr(197).chr(180) => 'W',
                                // U+0175 | ŵ | LATIN SMALL LETTER W WITH CIRCUMFLEX
                                chr(197).chr(181) => 'w',
                                // U+0176 | Ŷ | LATIN CAPITAL LETTER Y WITH CIRCUMFLEX
                                chr(197).chr(182) => 'Y',
                                // U+0177 | ŷ | LATIN SMALL LETTER Y WITH CIRCUMFLEX
                                chr(197).chr(183) => 'y',
                                // U+0178 | Ÿ | LATIN CAPITAL LETTER Y WITH DIAERESIS
                                chr(197).chr(184) => 'Y',
                                // U+0179 | Ź | LATIN CAPITAL LETTER Z WITH ACUTE
                                chr(197).chr(185) => 'Z',
                                // U+017A | ź | LATIN SMALL LETTER Z WITH ACUTE
                                chr(197).chr(186) => 'z',
                                // U+017B | Ż | LATIN CAPITAL LETTER Z WITH DOT ABOVE
                                chr(197).chr(187) => 'Z',
                                // U+017C | ż | LATIN SMALL LETTER Z WITH DOT ABOVE
                                chr(197).chr(188) => 'z',
                                // U+017D | Ž | LATIN CAPITAL LETTER Z WITH CARON
                                chr(197).chr(189) => 'Z',
                                // U+017E | ž | LATIN SMALL LETTER Z WITH CARON
                                chr(197).chr(190) => 'z',
                                // U+017F | ſ | LATIN SMALL LETTER LONG S
                                chr(197).chr(191) => 's',
// XXX Add remainder of 198-128 (U+0181) thru 199-191 (U+01FF)
                                // U+01A0 | Ơ | LATIN CAPITAL LETTER O WITH HORN
                                chr(198).chr(160) => 'O',
                                // U+01A1 | ơ | LATIN SMALL LETTER O WITH HORN
                                chr(198).chr(161) => 'o',
                                // U+01AF | Ư | LATIN CAPITAL LETTER U WITH HORN
                                chr(198).chr(175) => 'U',
                                // U+01B0 | ư | LATIN SMALL LETTER U WITH HORN
                                chr(198).chr(176) => 'u',
                                // U+01CD | Ǎ | LATIN CAPITAL LETTER A WITH CARON
                                chr(199).chr(141) => 'A',
                                // U+01CE | ǎ | LATIN SMALL LETTER A WITH CARON
                                chr(199).chr(142) => 'a',
                                // U+01CF | Ǐ | LATIN CAPITAL LETTER I WITH CARON
                                chr(199).chr(143) => 'I',
                                // U+01D0 | ǐ | LATIN SMALL LETTER I WITH CARON
                                chr(199).chr(144) => 'i',
                                // U+01D1 | Ǒ | LATIN CAPITAL LETTER O WITH CARON
                                chr(199).chr(145) => 'O',
                                // U+01D2 | ǒ | LATIN SMALL LETTER O WITH CARON
                                chr(199).chr(146) => 'o',
                                // U+01D3 | Ǔ | LATIN CAPITAL LETTER U WITH CARON
                                chr(199).chr(147) => 'U',
                                // U+01D4 | ǔ | LATIN SMALL LETTER U WITH CARON
                                chr(199).chr(148) => 'u',
                                // U+01D5 | Ǖ | LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON
                                chr(199).chr(149) => 'U',
                                // U+01D6 | ǖ | LATIN SMALL LETTER U WITH DIAERESIS AND MACRON
                                chr(199).chr(150) => 'u',
                                // U+01D7 | Ǘ | LATIN CAPITAL LETTER U WITH DIAERESIS AND ACUTE
                                chr(199).chr(151) => 'U',
                                // U+01D8 | ǘ | LATIN SMALL LETTER U WITH DIAERESIS AND ACUTE
                                chr(199).chr(152) => 'u',
                                // U+01D9 | Ǚ | LATIN CAPITAL LETTER U WITH DIAERESIS AND CARON
                                chr(199).chr(153) => 'U',
                                // U+01DA | ǚ | LATIN SMALL LETTER U WITH DIAERESIS AND CARON
                                chr(199).chr(154) => 'u',
                                // U+01DB | Ǜ | LATIN CAPITAL LETTER U WITH DIAERESIS AND GRAVE
                                chr(199).chr(155) => 'U',
                                // U+01DC | ǜ | LATIN SMALL LETTER U WITH DIAERESIS AND GRAVE
                                chr(199).chr(156) => 'u',
                                // U+0218 | Ș | LATIN CAPITAL LETTER S WITH COMMA BELOW
// XXX Review of unimplemented codes below this point is necessary.
                                chr(200).chr(152) => 'S',
                                // U+0219 | ș | LATIN SMALL LETTER S WITH COMMA BELOW
                                chr(200).chr(153) => 's',
                                // U+021A | Ț | LATIN CAPITAL LETTER T WITH COMMA BELOW
                                chr(200).chr(154) => 'T',
                                // U+021B | ț | LATIN SMALL LETTER T WITH COMMA BELOW
                                chr(200).chr(155) => 't',
                        // Vowels with diacritic (Chinese, Hanyu Pinyin)
                                // U+0251 | ɑ | LATIN SMALL LETTER ALPHA
                                chr(201).chr(145) => 'a',
                                // U+1EA0 | Ạ | LATIN CAPITAL LETTER A WITH DOT BELOW
                                chr(225).chr(186).chr(160) => 'A',
                                // U+1EA1 | ạ | LATIN SMALL LETTER A WITH DOT BELOW
                                chr(225).chr(186).chr(161) => 'a',
                                // U+1EA2 | Ả | LATIN CAPITAL LETTER A WITH HOOK ABOVE
                                chr(225).chr(186).chr(162) => 'A',
                                // U+1EA3 | ả | LATIN SMALL LETTER A WITH HOOK ABOVE
                                chr(225).chr(186).chr(163) => 'a',
                                // U+1EA4 | Ấ | LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND ACUTE
                                chr(225).chr(186).chr(164) => 'A',
                                // U+1EA5 | ấ | LATIN SMALL LETTER A WITH CIRCUMFLEX AND ACUTE
                                chr(225).chr(186).chr(165) => 'a',
                                // U+1EA6 | Ầ | LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND GRAVE
                                chr(225).chr(186).chr(166) => 'A',
                                // U+1EA7 | ầ | LATIN SMALL LETTER A WITH CIRCUMFLEX AND GRAVE
                                chr(225).chr(186).chr(167) => 'a',
                                // U+1EA8 | Ẩ | LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND HOOK ABOVE
                                chr(225).chr(186).chr(168) => 'A',
                                // U+1EA9 | ẩ | LATIN SMALL LETTER A WITH CIRCUMFLEX AND HOOK ABOVE
                                chr(225).chr(186).chr(169) => 'a',
                                // U+1EAA | Ẫ | LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND TILDE
                                chr(225).chr(186).chr(170) => 'A',
                                // U+1EAB | ẫ | LATIN SMALL LETTER A WITH CIRCUMFLEX AND TILDE
                                chr(225).chr(186).chr(171) => 'a',
                                // U+1EA6 | Ậ | LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND DOT BELOW
                                chr(225).chr(186).chr(172) => 'A',
                                // U+1EAD | ậ | LATIN SMALL LETTER A WITH CIRCUMFLEX AND DOT BELOW
                                chr(225).chr(186).chr(173) => 'a',
                                // U+1EAE | Ắ | LATIN CAPITAL LETTER A WITH BREVE AND ACUTE
                                chr(225).chr(186).chr(174) => 'A',
                                // U+1EAF | ắ | LATIN SMALL LETTER A WITH BREVE AND ACUTE
                                chr(225).chr(186).chr(175) => 'a',
                                // U+1EB0 | Ằ | LATIN CAPITAL LETTER A WITH BREVE AND GRAVE
                                chr(225).chr(186).chr(176) => 'A',
                                // U+1EB1 | ằ | LATIN SMALL LETTER A WITH BREVE AND GRAVE
                                chr(225).chr(186).chr(177) => 'a',
                                // U+1EB2 | Ẳ | LATIN CAPITAL LETTER A WITH BREVE AND HOOK ABOVE
                                chr(225).chr(186).chr(178) => 'A',
                                // U+1EB3 | ẳ | LATIN SMALL LETTER A WITH BREVE AND HOOK ABOVE
                                chr(225).chr(186).chr(179) => 'a',
                                // U+1EB4 | Ẵ | LATIN CAPITAL LETTER A WITH BREVE AND TILDE
                                chr(225).chr(186).chr(180) => 'A',
                                // U+1EB5 | ẵ | LATIN SMALL LETTER A WITH BREVE AND TILDE
                                chr(225).chr(186).chr(181) => 'a',
                                // U+1EB6 | Ặ | LATIN CAPITAL LETTER A WITH BREVE AND DOT BELOW
                                chr(225).chr(186).chr(182) => 'A',
                                // U+1EB7 | ặ | LATIN SMALL LETTER A WITH BREVE AND DOT BELOW
                                chr(225).chr(186).chr(183) => 'a',
                                // U+1EB8 | Ẹ | LATIN CAPITAL LETTER E WITH DOT BELOW
                                chr(225).chr(186).chr(184) => 'E',
                                // U+1EB9 | ẹ | LATIN SMALL LETTER E WITH DOT BELOW
                                chr(225).chr(186).chr(185) => 'e',
                                // U+1EBA | Ẻ | LATIN CAPITAL LETTER E WITH HOOK ABOVE
                                chr(225).chr(186).chr(186) => 'E',
                                // U+1EBB | ẻ | LATIN SMALL LETTER E WITH HOOK ABOVE
                                chr(225).chr(186).chr(187) => 'e',
                                // U+1EBC | Ẽ | LATIN CAPITAL LETTER E WITH TILDE
                                chr(225).chr(186).chr(188) => 'E',
                                // U+1EBD | ẽ | LATIN SMALL LETTER E WITH TILDE
                                chr(225).chr(186).chr(189) => 'e',
                                // U+1EBE | Ế | LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND ACUTE
                                chr(225).chr(186).chr(190) => 'E',
                                // U+1EBF | ế | LATIN SMALL LETTER E WITH CIRCUMFLEX AND ACUTE
                                chr(225).chr(186).chr(191) => 'e',
                                // U+1EC0 | Ề | LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND GRAVE
                                chr(225).chr(187).chr(128) => 'E',
                                // U+1EC1 | ề | LATIN SMALL LETTER E WITH CIRCUMFLEX AND GRAVE
                                chr(225).chr(187).chr(129) => 'e',
                                // U+1EC2 | Ể | LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND HOOK ABOVE
                                chr(225).chr(187).chr(130) => 'E',
                                // U+1EC3 | ể | LATIN SMALL LETTER E WITH CIRCUMFLEX AND HOOK ABOVE
                                chr(225).chr(187).chr(131) => 'e',
                                // U+1EC4 | Ễ | LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND TILDE
                                chr(225).chr(187).chr(132) => 'E',
                                // U+1EC5 | ễ | LATIN SMALL LETTER E WITH CIRCUMFLEX AND TILDE
                                chr(225).chr(187).chr(133) => 'e',
                                // U+1EC6 | Ệ | LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND DOT BELOW
                                chr(225).chr(187).chr(134) => 'E',
                                // U+1EC7 | ệ | LATIN SMALL LETTER E WITH CIRCUMFLEX AND DOT BELOW
                                chr(225).chr(187).chr(135) => 'e',
                                // U+1EC8 | Ỉ | LATIN CAPITAL LETTER I WITH HOOK ABOVE
                                chr(225).chr(187).chr(136) => 'I',
                                // U+1EC9 | ỉ | LATIN SMALL LETTER I WITH HOOK ABOVE
                                chr(225).chr(187).chr(137) => 'i',
                                // U+1ECA | Ị | LATIN CAPITAL LETTER I WITH DOT BELOW
                                chr(225).chr(187).chr(138) => 'I',
                                // U+1ECB | ị | LATIN SMALL LETTER I WITH DOT BELOW
                                chr(225).chr(187).chr(139) => 'i',
                                // U+1ECC | Ọ | LATIN CAPITAL LETTER O WITH DOT BELOW
                                chr(225).chr(187).chr(140) => 'O',
                                // U+1ECD | ọ | LATIN SMALL LETTER O WITH DOT BELOW
                                chr(225).chr(187).chr(141) => 'o',
                                // U+1ECE | Ỏ | LATIN CAPITAL LETTER O WITH HOOK ABOVE
                                chr(225).chr(187).chr(142) => 'O',
                                // U+1ECF | ỏ | LATIN SMALL LETTER O WITH HOOK ABOVE
                                chr(225).chr(187).chr(143) => 'o',
                                // U+1ED0 | Ố | LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND ACUTE
                                chr(225).chr(187).chr(144) => 'O',
                                // U+1ED1 | ố | LATIN SMALL LETTER O WITH CIRCUMFLEX AND ACUTE
                                chr(225).chr(187).chr(145) => 'o',
                                // U+1ED2 | Ồ | LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND GRAVE
                                chr(225).chr(187).chr(146) => 'O',
                                // U+1ED3 | ồ | LATIN SMALL LETTER O WITH CIRCUMFLEX AND GRAVE
                                chr(225).chr(187).chr(147) => 'o',
                                // U+1ED4 | Ổ | LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND HOOK ABOVE
                                chr(225).chr(187).chr(148) => 'O',
                                // U+1ED5 | ổ | LATIN SMALL LETTER O WITH CIRCUMFLEX AND HOOK ABOVE
                                chr(225).chr(187).chr(149) => 'o',
                                // U+1ED6 | Ỗ | LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND TILDE
                                chr(225).chr(187).chr(150) => 'O',
                                // U+1ED7 | ỗ | LATIN SMALL LETTER O WITH CIRCUMFLEX AND TILDE
                                chr(225).chr(187).chr(151) => 'o',
                                // U+1ED8 | Ộ | LATIN CAPITAL LETTER O WITH CIRCUMFLEX AND DOT BELOW
                                chr(225).chr(187).chr(152) => 'O',
                                // U+1ED9 | ộ | LATIN SMALL LETTER O WITH CIRCUMFLEX AND DOT BELOW
                                chr(225).chr(187).chr(153) => 'o',
                                // U+1EDA | Ớ | LATIN CAPITAL LETTER O WITH HORN AND ACUTE
                                chr(225).chr(187).chr(154) => 'O',
                                // U+1EDB | ớ | LATIN SMALL LETTER O WITH HORN AND ACUTE
                                chr(225).chr(187).chr(155) => 'o',
                                // U+1EDC | Ờ | LATIN CAPITAL LETTER O WITH HORN AND GRAVE
                                chr(225).chr(187).chr(156) => 'O',
                                // U+1EDD | ờ | LATIN SMALL LETTER O WITH HORN AND GRAVE
                                chr(225).chr(187).chr(157) => 'o',
                                // U+1EDE | Ở | LATIN CAPITAL LETTER O WITH HORN AND HOOK ABOVE
                                chr(225).chr(187).chr(158) => 'O',
                                // U+1EDF | ở | LATIN SMALL LETTER O WITH HORN AND HOOK ABOVE
                                chr(225).chr(187).chr(159) => 'o',
                                // U+1EE0 | Ỡ | LATIN CAPITAL LETTER O WITH HORN AND TILDE
                                chr(225).chr(187).chr(160) => 'O',
                                // U+1EE1 | ỡ | LATIN SMALL LETTER O WITH HORN AND TILDE
                                chr(225).chr(187).chr(161) => 'o',
                                // U+1EE2 | Ợ | LATIN CAPITAL LETTER O WITH HORN AND DOT BELOW
                                chr(225).chr(187).chr(162) => 'O',
                                // U+1EE3 | ợ | LATIN SMALL LETTER O WITH HORN AND DOT BELOW
                                chr(225).chr(187).chr(163) => 'o',
                                // U+1EE4 | Ụ | LATIN CAPITAL LETTER U WITH DOT BELOW
                                chr(225).chr(187).chr(164) => 'U',
                                // U+1EE5 | ụ | LATIN SMALL LETTER U WITH DOT BELOW
                                chr(225).chr(187).chr(165) => 'u',
                                // U+1EE6 | Ủ | LATIN CAPITAL LETTER U WITH HOOK ABOVE
                                chr(225).chr(187).chr(166) => 'U',
                                // U+1EE7 | ủ | LATIN SMALL LETTER U WITH HOOK ABOVE
                                chr(225).chr(187).chr(167) => 'u',
                                // U+1EE8 | Ứ | LATIN CAPITAL LETTER U WITH HORN AND ACUTE
                                chr(225).chr(187).chr(168) => 'U',
                                // U+1EE9 | ứ | LATIN SMALL LETTER U WITH HORN AND ACUTE
                                chr(225).chr(187).chr(169) => 'u',
                                // U+1EEA | Ừ | LATIN CAPITAL LETTER U WITH HORN AND GRAVE
                                chr(225).chr(187).chr(170) => 'U',
                                // U+1EEB | ừ | LATIN SMALL LETTER U WITH HORN AND GRAVE
                                chr(225).chr(187).chr(171) => 'u',
                                // U+1EEC | Ử | LATIN CAPITAL LETTER U WITH HORN AND HOOK ABOVE
                                chr(225).chr(187).chr(172) => 'U',
                                // U+1EED | ử | LATIN SMALL LETTER U WITH HORN AND HOOK ABOVE
                                chr(225).chr(187).chr(173) => 'u',
                                // U+1EEE | Ữ | LATIN CAPITAL LETTER U WITH HORN AND TILDE
                                chr(225).chr(187).chr(174) => 'U',
                                // U+1EEF | ữ | LATIN SMALL LETTER U WITH HORN AND TILDE
                                chr(225).chr(187).chr(175) => 'u',
                                // U+1EF0 | Ự | LATIN CAPITAL LETTER U WITH HORN AND DOT BELOW
                                chr(225).chr(187).chr(176) => 'U',
                                // U+1EF1 | ự | LATIN SMALL LETTER U WITH HORN AND DOT BELOW
                                chr(225).chr(187).chr(177) => 'u',
                                // U+1EF2 | Ỳ | LATIN CAPITAL LETTER Y WITH GRAVE
                                chr(225).chr(187).chr(178) => 'Y',
                                // U+1EF3 | ỳ | LATIN SMALL LETTER Y WITH GRAVE
                                chr(225).chr(187).chr(179) => 'y',
                                // U+1EF4 | Ỵ | LATIN CAPITAL LETTER Y WITH DOT BELOW
                                chr(225).chr(187).chr(180) => 'Y',
                                // U+1EF5 | ỵ | LATIN SMALL LETTER Y WITH DOT BELOW
                                chr(225).chr(187).chr(181) => 'y',
                                // U+1EF6 | Ỷ | LATIN CAPITAL LETTER Y WITH HOOK ABOVE
                                chr(225).chr(187).chr(182) => 'Y',
                                // U+1EF7 | ỷ | LATIN SMALL LETTER Y WITH HOOK ABOVE
                                chr(225).chr(187).chr(183) => 'y',
                                // U+1EF8 | Ỹ | LATIN CAPITAL LETTER Y WITH TILDE
                                chr(225).chr(187).chr(184) => 'Y',
                                // U+1EF9 | ỹ | LATIN SMALL LETTER Y WITH TILDE
                                chr(225).chr(187).chr(185) => 'y',
                                // U+20AC | € | EURO SIGN
                                chr(226).chr(130).chr(172) => 'E',
                        );

#3 @DrewAPicture
4 years ago

  • Keywords needs-patch added

@SergeyBiryukov Besides not documenting in all caps, I can see documenting the character codes as useful. What do you think about the rest of it, i.e. describing the obvious characteristics.

Also, what/whom would be the best way of confirming this information so we can move forward here?

cc @ocean90 @petya

#4 @SergeyBiryukov
4 years ago

  • Milestone changed from Awaiting Review to 4.6
  • Owner set to SergeyBiryukov
  • Status changed from new to reviewing

#5 @DrewAPicture
3 years ago

@SergeyBiryukov Happy to generate a patch for this, but I'd need some guidance on my questions in comment:3.

This ticket was mentioned in Slack in #core by ocean90. View the logs.


3 years ago

#7 @ocean90
3 years ago

  • Keywords good-first-bug added
  • Milestone changed from 4.6 to Future Release

@DrewAPicture Which information do you need to be confirmed? Happy to take a look at something if you point me to it.

TinyMCE has a similar char list: https://github.com/tinymce/tinymce/blob/master/js/tinymce/plugins/charmap/plugin.js#L16

#8 @DrewAPicture
3 years ago

  • Owner changed from SergeyBiryukov to DrewAPicture

@DrewAPicture
3 years ago

#9 @DrewAPicture
3 years ago

  • Keywords has-patch added; needs-patch good-first-bug removed
  • Milestone changed from Future Release to 4.6

Whew.

6 hours of effort yields 34677.2.diff, which takes the suggested inline comments and formats them into markdown tables in the DocBlock rather than placing them inline in the code. A major downside of documenting them as inline comments is that the information is only really available to somebody reading the source code.

Formatting into markdown tables has the benefit of being parseable for the Code Reference and also easily searchable from within that page. See 34677_output.2.png for what that looks like post-parsing.

Last edited 3 years ago by DrewAPicture (previous) (diff)

@DrewAPicture
3 years ago

#10 @DrewAPicture
3 years ago

  • Description modified (diff)

#11 @DrewAPicture
3 years ago

  • Keywords has-screenshots added

#12 @DrewAPicture
3 years ago

In 37669:

Docs: Add extensive documentation to the remove_accents() DocBlock outlining the accented characters core replaces.

Covers:

  • Currency signs
  • Decompositions for Latin-1 Supplement
  • Decompositions for Latin Extended-A
  • Decompositions for Latin Extended-B
  • Vowels with diacritic (Chinese, Hanyu Pinyin)
  • Characters replaced for the de_DE, de_DE_formal, and da_DK locales

Props john_schlick for the initial work.
Props DrewAPicture, ocean90.

See #34677.

#13 @DrewAPicture
3 years ago

  • Resolution set to fixed
  • Status changed from reviewing to closed

Closing this as fixed. If there any issues, let's open new tickets to address them.

Note: See TracTickets for help on using tickets.