Make WordPress Core

Opened 10 years ago

Closed 9 years ago

#34677 closed enhancement (fixed)

Inline comments for remove_accents()

Reported by: john_schlick's profile John_Schlick Owned by: drewapicture's profile DrewAPicture
Milestone: 4.6 Priority: normal
Severity: normal Version: 4.4
Component: Formatting Keywords: has-patch has-screenshots
Focuses: docs Cc:

Description (last modified by DrewAPicture)

I was given a copy of a single function from this file to use, and I have added comments to it.
In: https://core.trac.wordpress.org/browser/trunk/src/wp-includes/formatting.php#L1128

in the remove_accents function, you can use this for the chars array:
(it's the same contents exactly just in an ever so slightly different order, and with super awesome comments mostly added. (Feel free to email me or call me at [removed] when you go to add this, I may have a more commented version by then.)

Attachments (3)

34677.diff (30.3 KB) - added by DrewAPicture 9 years ago.
34677.2.diff (30.1 KB) - added by DrewAPicture 9 years ago.
34677_output.2.png (1.6 MB) - added by DrewAPicture 9 years ago.

Download all attachments as: .zip

Change History (16)

#1 @SergeyBiryukov
10 years ago

  • Component changed from General to Formatting
  • Focuses docs added
  • Summary changed from enhancement to /src/wp-includes/formatting.php to Inline comments for remove_accents()

#2 @John_Schlick
10 years ago

Here is the completed version with ALL the comments, and the characters in unicode order with full unicode names:

<?php
                        $chars = array(
                                // U+00A3 | £ | POUND SIGN <- why is this eliminated?
                                chr(194).chr(163) => '',
                        // Decompositions for Latin-1 Supplement
                                // U+00AA | ª | FEMININE ORDINAL INDICATOR
                                chr(194).chr(170) => 'a',
                                // U+00BA | º | MASCULINE ORDINAL INDICATOR
                                chr(194).chr(186) => 'o',
                                // U+00C0 | À | LATIN CAPITAL LETTER A WITH GRAVE
                                chr(195).chr(128) => 'A',
                                // U+00C1 | Á | LATIN CAPITAL LETTER A WITH ACUTE
                                chr(195).chr(129) => 'A',
                                // U+00C2 | Â | LATIN CAPITAL LETTER A WITH CIRCUMFLEX
                                chr(195).chr(130) => 'A',
                                // U+00C3 | Ã | LATIN CAPITAL LETTER A WITH TILDE
                                chr(195).chr(131) => 'A',
                                // U+00C4 | Ä | LATIN CAPITAL LETTER A WITH DIAERESIS
                                chr(195).chr(132) => 'A',
                                // U+00C5 | Å | LATIN CAPITAL LETTER A WITH RING ABOVE
                                chr(195).chr(133) => 'A',
                                // U+00C6 | Æ | LATIN CAPITAL LETTER AE
                                chr(195).chr(134) => 'AE',
                                // U+00C7 | Ç | LATIN CAPITAL LETTER C WITH CEDILLA
                                chr(195).chr(135) => 'C',
                                // U+00C8 | È | LATIN CAPITAL LETTER E WITH GRAVE
                                chr(195).chr(136) => 'E',
                                // U+00C9 | É | LATIN CAPITAL LETTER E WITH ACUTE
                                chr(195).chr(137) => 'E',
                                // U+00CA | Ê | LATIN CAPITAL LETTER E WITH CIRCUMFLEX
                                chr(195).chr(138) => 'E',
                                // U+00CB | Ë | LATIN CAPITAL LETTER E WITH DIAERESIS
                                chr(195).chr(139) => 'E',
                                // U+00CC | Ì | LATIN CAPITAL LETTER I WITH GRAVE
                                chr(195).chr(140) => 'I',
                                // U+00CD | Í | LATIN CAPITAL LETTER I WITH ACUTE
                                chr(195).chr(141) => 'I',
                                // U+00CE | Î | LATIN CAPITAL LETTER I WITH CIRCUMFLEX
                                chr(195).chr(142) => 'I',
                                // U+00CF | Ï | LATIN CAPITAL LETTER I WITH DIAERESIS
                                chr(195).chr(143) => 'I',
                                // U+00D0 | Ð | LATIN CAPITAL LETTER ETH
                                chr(195).chr(144) => 'D',
                                // U+00D1 | Ñ | LATIN CAPITAL LETTER N WITH TILDE
                                chr(195).chr(145) => 'N',
                                // U+00D2 | Ò | LATIN CAPITAL LETTER O WITH GRAVE
                                chr(195).chr(146) => 'O',
                                // U+00D3 | Ó | LATIN CAPITAL LETTER O WITH ACUTE
                                chr(195).chr(147) => 'O',
                                // U+00D4 | Ô | LATIN CAPITAL LETTER O WITH CIRCUMFLEX
                                chr(195).chr(148) => 'O',
                                // U+00D5 | Õ | LATIN CAPITAL LETTER O WITH TILDE
                                chr(195).chr(149) => 'O',
                                // U+00D6 | Ö | LATIN CAPITAL LETTER O WITH DIAERESIS
                                chr(195).chr(150) => 'O',
                                // U+00D8 | Ø | LATIN CAPITAL LETTER O WITH STROKE
                                chr(195).chr(152) => 'O',
                                // U+00D9 | Ù | LATIN CAPITAL LETTER U WITH GRAVE
                                chr(195).chr(153) => 'U',
                                // U+00DA | Ú | LATIN CAPITAL LETTER U WITH ACUTE
                                chr(195).chr(154) => 'U',
                                // U+00DB | Û | LATIN CAPITAL LETTER U WITH CIRCUMFLEX
                                chr(195).chr(155) => 'U',
                                // U+00DC | Ü | LATIN CAPITAL LETTER U WITH DIAERESIS
                                chr(195).chr(156) => 'U',
                                // U+00DD | Ý | LATIN CAPITAL LETTER Y WITH ACUTE
                                chr(195).chr(157) => 'Y',
                                // U+00DE | Þ | LATIN CAPITAL LETTER THORN
                                chr(195).chr(158) => 'TH',
                                // U+00DF | ß | LATIN SMALL LETTER SHARP S
                                chr(195).chr(159) => 's',
                                // U+00E0 | à | LATIN SMALL LETTER A WITH GRAVE
                                chr(195).chr(160) => 'a',
                                // U+00E1 | á | LATIN SMALL LETTER A WITH ACUTE
                                chr(195).chr(161) => 'a',
                                // U+00E2 | â | LATIN SMALL LETTER A WITH CIRCUMFLEX
                                chr(195).chr(162) => 'a',
                                // U+00E3 | ã | LATIN SMALL LETTER A WITH TILDE
                                chr(195).chr(163) => 'a',
                                // U+00E4 | ä | LATIN SMALL LETTER A WITH DIAERESIS
                                chr(195).chr(164) => 'a',
                                // U+00E5 | å | LATIN SMALL LETTER A WITH RING ABOVE
                                chr(195).chr(165) => 'a',
                                // U+00E6 | æ | LATIN SMALL LETTER AE
                                chr(195).chr(166) => 'ae',
                                // U+00E7 | ç | LATIN SMALL LETTER C WITH CEDILLA
                                chr(195).chr(167) => 'c',
                                // U+00E8 | è | LATIN SMALL LETTER E WITH GRAVE
                                chr(195).chr(168) => 'e',
                                // U+00E9 | é | LATIN SMALL LETTER E WITH ACUTE
                                chr(195).chr(169) => 'e',
                                // U+00EA | ê | LATIN SMALL LETTER E WITH CIRCUMFLEX
                                chr(195).chr(170) => 'e',
                                // U+00EB | ë | LATIN SMALL LETTER E WITH DIAERESIS
                                chr(195).chr(171) => 'e',
                                // U+00EC | ì | LATIN SMALL LETTER I WITH GRAVE
                                chr(195).chr(172) => 'i',
                                // U+00ED | í | LATIN SMALL LETTER I WITH ACUTE
                                chr(195).chr(173) => 'i',
                                // U+00EE | î | LATIN SMALL LETTER I WITH CIRCUMFLEX
                                chr(195).chr(174) => 'i',
                                // U+00EF | ï | LATIN SMALL LETTER I WITH DIAERESIS
                                chr(195).chr(175) => 'i',
                                // U+00F0 | ð | LATIN SMALL LETTER ETH
                                chr(195).chr(176) => 'd',
                                // U+00F1 | ñ | LATIN SMALL LETTER N WITH TILDE
                                chr(195).chr(177) => 'n',
                                // U+00F2 | ò | LATIN SMALL LETTER O WITH GRAVE
                                chr(195).chr(178) => 'o',
                                // U+00F3 | ó | LATIN SMALL LETTER O WITH ACUTE
                                chr(195).chr(179) => 'o',
                                // U+00F4 | ô | LATIN SMALL LETTER O WITH CIRCUMFLEX
                                chr(195).chr(180) => 'o',
                                // U+00F5 | õ | LATIN SMALL LETTER O WITH TILDE
                                chr(195).chr(181) => 'o',
                                // U+00F6 | ö | LATIN SMALL LETTER O WITH DIAERESIS
                                chr(195).chr(182) => 'o',
                                // U+00F8 | ø | LATIN SMALL LETTER O WITH STROKE
                                chr(195).chr(184) => 'o',
                                // U+00F9 | ù | LATIN SMALL LETTER U WITH GRAVE
                                chr(195).chr(185) => 'u',
                                // U+00FA | ú | LATIN SMALL LETTER U WITH ACUTE
                                chr(195).chr(186) => 'u',
                                // U+00FB | û | LATIN SMALL LETTER U WITH CIRCUMFLEX
                                chr(195).chr(187) => 'u',
                                // U+00FC | ü | LATIN SMALL LETTER U WITH DIAERESIS
                                chr(195).chr(188) => 'u',
                                // U+00FD | ý | LATIN SMALL LETTER Y WITH ACUTE
                                chr(195).chr(189) => 'y',
                                // U+00FE | þ | LATIN SMALL LETTER THORN
                                chr(195).chr(190) => 'th',
                                // U+00FF | ÿ | LATIN SMALL LETTER Y WITH DIAERESIS
                                chr(195).chr(191) => 'y',
                        // Decompositions for Latin Extended-A
                                // U+0100 | Ā | LATIN CAPITAL LETTER A WITH MACRON
                                chr(196).chr(128) => 'A',
                                // U+0101 | ā | LATIN SMALL LETTER A WITH MACRON
                                chr(196).chr(129) => 'a',
                                // U+0102 | Ă | LATIN CAPITAL LETTER A WITH BREVE
                                chr(196).chr(130) => 'A',
                                // U+0103 | ă | LATIN SMALL LETTER A WITH BREVE
                                chr(196).chr(131) => 'a',
                                // U+0104 | Ą | LATIN CAPITAL LETTER A WITH OGONEK
                                chr(196).chr(132) => 'A',
                                // U+0105 | ą | LATIN SMALL LETTER A WITH OGONEK
                                chr(196).chr(133) => 'a',
                                // U+01006 | Ć | LATIN CAPITAL LETTER C WITH ACUTE
                                chr(196).chr(134) => 'C',
                                // U+0107 | ć | LATIN SMALL LETTER C WITH ACUTE
                                chr(196).chr(135) => 'c',
                                // U+0108 | Ĉ | LATIN CAPITAL LETTER C WITH CIRCUMFLEX
                                chr(196).chr(136) => 'C',
                                // U+0109 | ĉ | LATIN SMALL LETTER C WITH CIRCUMFLEX
                                chr(196).chr(137) => 'c',
                                // U+010A | Ċ | LATIN CAPITAL LETTER C WITH DOT ABOVE
                                chr(196).chr(138) => 'C',
                                // U+010B | ċ | LATIN SMALL LETTER C WITH DOT ABOVE
                                chr(196).chr(139) => 'c',
                                // U+010C | Č | LATIN CAPITAL LETTER C WITH CARON
                                chr(196).chr(140) => 'C',
                                // U+010D | č | LATIN SMALL LETTER C WITH CARON
                                chr(196).chr(141) => 'c',
                                // U+010E | Ď | LATIN CAPITAL LETTER D WITH CARON
                                chr(196).chr(142) => 'D',
                                // U+010F | ď | LATIN SMALL LETTER D WITH CARON
                                chr(196).chr(143) => 'd',
                                // U+0110 | Đ | LATIN CAPITAL LETTER D WITH STROKE
                                chr(196).chr(144) => 'D',
                                // U+0111 | đ | LATIN SMALL LETTER D WITH STROKE
                                chr(196).chr(145) => 'd',
                                // U+0112 | Ē | LATIN CAPITAL LETTER E WITH MACRON
                                chr(196).chr(146) => 'E',
                                // U+0113 | ē | LATIN SMALL LETTER E WITH MACRON
                                chr(196).chr(147) => 'e',
                                // U+0114 | Ĕ | LATIN CAPITAL LETTER E WITH BREVE
                                chr(196).chr(148) => 'E',
                                // U+0115 | ĕ | LATIN SMALL LETTER E WITH BREVE
                                chr(196).chr(149) => 'e',
                                // U+0116 | Ė | LATIN CAPITAL LETTER E WITH DOT ABOVE
                                chr(196).chr(150) => 'E',
                                // U+0117 | ė | LATIN SMALL LETTER E WITH DOT ABOVE
                                chr(196).chr(151) => 'e',
                                // U+0118 | Ę | LATIN CAPITAL LETTER E WITH OGONEK
                                chr(196).chr(152) => 'E',
                                // U+0119 | ę | LATIN SMALL LETTER E WITH OGONEK
                                chr(196).chr(153) => 'e',
                                // U+011A | Ě | LATIN CAPITAL LETTER E WITH CARON
                                chr(196).chr(154) => 'E',
                                // U+011B | ě | LATIN SMALL LETTER E WITH CARON
                                chr(196).chr(155) => 'e',
                                // U+011C | Ĝ | LATIN CAPITAL LETTER G WITH CIRCUMFLEX
                                chr(196).chr(156) => 'G',
                                // U+011D | ĝ | LATIN SMALL LETTER G WITH CIRCUMFLEX
                                chr(196).chr(157) => 'g',
                                // U+011E | Ğ | LATIN CAPITAL LETTER G WITH BREVE
                                chr(196).chr(158) => 'G',
                                // U+011F | ğ | LATIN SMALL LETTER G WITH BREVE
                                chr(196).chr(159) => 'g',
                                // U+0120 | Ġ | LATIN CAPITAL LETTER G WITH DOT ABOVE
                                chr(196).chr(160) => 'G',
                                // U+0121 | ġ | LATIN SMALL LETTER G WITH DOT ABOVE
                                chr(196).chr(161) => 'g',
                                // U+0122 | Ģ | LATIN CAPITAL LETTER G WITH CEDILLA
                                chr(196).chr(162) => 'G',
                                // U+0123 | ģ | LATIN SMALL LETTER G WITH CEDILLA
                                chr(196).chr(163) => 'g',
                                // U+0124 | Ĥ | LATIN CAPITAL LETTER H WITH CIRCUMFLEX
                                chr(196).chr(164) => 'H',
                                // U+0125 | ĥ | LATIN SMALL LETTER H WITH CIRCUMFLEX
                                chr(196).chr(165) => 'h',
                                // U+0126 | Ħ | LATIN CAPITAL LETTER H WITH STROKE
                                chr(196).chr(166) => 'H',
                                // U+0127 | ħ | LATIN SMALL LETTER H WITH STROKE
                                chr(196).chr(167) => 'h',
                                // U+0128 | Ĩ | LATIN CAPITAL LETTER I WITH TILDE
                                chr(196).chr(168) => 'I',
                                // U+0129 | ĩ | LATIN SMALL LETTER I WITH TILDE
                                chr(196).chr(169) => 'i',
                                // U+012A | Ī | LATIN CAPITAL LETTER I WITH MACRON
                                chr(196).chr(170) => 'I',
                                // U+012B | ī | LATIN SMALL LETTER I WITH MACRON
                                chr(196).chr(171) => 'i',
                                // U+012C | Ĭ | LATIN CAPITAL LETTER I WITH BREVE
                                chr(196).chr(172) => 'I',
                                // U+012D | ĭ | LATIN SMALL LETTER I WITH BREVE
                                chr(196).chr(173) => 'i',
                                // U+012E | Į | LATIN CAPITAL LETTER I WITH OGONEK
                                chr(196).chr(174) => 'I',
                                // U+012F | į | LATIN SMALL LETTER I WITH OGONEK
                                chr(196).chr(175) => 'i',
                                // U+0130 | İ | LATIN CAPITAL LETTER I WITH DOT ABOVE
                                chr(196).chr(176) => 'I',
                                // U+0131 | ı | LATIN SMALL LETTER DOTLESS I
                                chr(19 

WordPress.org: Please note that this content has been truncated for display.

#3 @DrewAPicture
9 years ago

  • Keywords needs-patch added

@SergeyBiryukov Besides not documenting in all caps, I can see documenting the character codes as useful. What do you think about the rest of it, i.e. describing the obvious characteristics.

Also, what/whom would be the best way of confirming this information so we can move forward here?

cc @ocean90 @petya

#4 @SergeyBiryukov
9 years ago

  • Milestone changed from Awaiting Review to 4.6
  • Owner set to SergeyBiryukov
  • Status changed from new to reviewing

#5 @DrewAPicture
9 years ago

@SergeyBiryukov Happy to generate a patch for this, but I'd need some guidance on my questions in comment:3.

This ticket was mentioned in Slack in #core by ocean90. View the logs.


9 years ago

#7 @ocean90
9 years ago

  • Keywords good-first-bug added
  • Milestone changed from 4.6 to Future Release

@DrewAPicture Which information do you need to be confirmed? Happy to take a look at something if you point me to it.

TinyMCE has a similar char list: https://github.com/tinymce/tinymce/blob/master/js/tinymce/plugins/charmap/plugin.js#L16

#8 @DrewAPicture
9 years ago

  • Owner changed from SergeyBiryukov to DrewAPicture

@DrewAPicture
9 years ago

#9 @DrewAPicture
9 years ago

  • Keywords has-patch added; needs-patch good-first-bug removed
  • Milestone changed from Future Release to 4.6

Whew.

6 hours of effort yields 34677.2.diff, which takes the suggested inline comments and formats them into markdown tables in the DocBlock rather than placing them inline in the code. A major downside of documenting them as inline comments is that the information is only really available to somebody reading the source code.

Formatting into markdown tables has the benefit of being parseable for the Code Reference and also easily searchable from within that page. See 34677_output.2.png for what that looks like post-parsing.

Last edited 9 years ago by DrewAPicture (previous) (diff)

@DrewAPicture
9 years ago

#10 @DrewAPicture
9 years ago

  • Description modified (diff)

#11 @DrewAPicture
9 years ago

  • Keywords has-screenshots added

#12 @DrewAPicture
9 years ago

In 37669:

Docs: Add extensive documentation to the remove_accents() DocBlock outlining the accented characters core replaces.

Covers:

  • Currency signs
  • Decompositions for Latin-1 Supplement
  • Decompositions for Latin Extended-A
  • Decompositions for Latin Extended-B
  • Vowels with diacritic (Chinese, Hanyu Pinyin)
  • Characters replaced for the de_DE, de_DE_formal, and da_DK locales

Props john_schlick for the initial work.
Props DrewAPicture, ocean90.

See #34677.

#13 @DrewAPicture
9 years ago

  • Resolution set to fixed
  • Status changed from reviewing to closed

Closing this as fixed. If there any issues, let's open new tickets to address them.

Note: See TracTickets for help on using tickets.