Opened 9 years ago
Closed 9 years ago
#40817 closed defect (bug) (invalid)
WordCounter removeRegExp maybe broken
| Reported by: |
|
Owned by: | |
|---|---|---|---|
| Milestone: | Priority: | normal | |
| Severity: | normal | Version: | 4.6.4 |
| Component: | Editor | Keywords: | |
| Focuses: | javascript, administration | Cc: |
Description
In the file \wp-admin\js\word-count.js at around line 27 of my WP 4.6.6 there is the removeRegExp for WordCounter.prototype.settings:
removeRegExp: new RegExp( [
'[',
// Basic Latin (extract)
'\u0021-\u0040\u005B-\u0060\u007B-\u007E',
// Latin-1 Supplement (extract)
'\u0080-\u00BF\u00D7\u00F7',
// General Punctuation
// Superscripts and Subscripts
// Currency Symbols
// Combining Diacritical Marks for Symbols
// Letterlike Symbols
// Number Forms
// Arrows
// Mathematical Operators
// Miscellaneous Technical
// Control Pictures
// Optical Character Recognition
// Enclosed Alphanumerics
// Box Drawing
// Block Elements
// Geometric Shapes
// Miscellaneous Symbols
// Dingbats
// Miscellaneous Mathematical Symbols-A
// Supplemental Arrows-A
// Braille Patterns
// Supplemental Arrows-B
// Miscellaneous Mathematical Symbols-B
// Supplemental Mathematical Operators
// Miscellaneous Symbols and Arrows
'\u2000-\u2BFF',
// Supplemental Punctuation
'\u2E00-\u2E7F',
']'
].join( '' ), 'g' ),
But according to Javascript docs https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp when using string notation the backslashes should be escaped:
When using the constructor function, the normal string escape rules (preceding special characters with \ when included in a string) are necessary. For example, the following are equivalent:
var re = /\w+/;
var re = new RegExp('\\w+');
So shouldn't be this the correct way to build that regexp since it uses the second way with the string, with \\u in place of \u?
removeRegExp: new RegExp( [
'[',
// Basic Latin (extract)
'\\u0021-\\u0040\\u005B-\\u0060\\u007B-\\u007E',
// Latin-1 Supplement (extract)
'\\u0080-\\u00BF\\u00D7\\u00F7',
// General Punctuation
// Superscripts and Subscripts
// Currency Symbols
// Combining Diacritical Marks for Symbols
// Letterlike Symbols
// Number Forms
// Arrows
// Mathematical Operators
// Miscellaneous Technical
// Control Pictures
// Optical Character Recognition
// Enclosed Alphanumerics
// Box Drawing
// Block Elements
// Geometric Shapes
// Miscellaneous Symbols
// Dingbats
// Miscellaneous Mathematical Symbols-A
// Supplemental Arrows-A
// Braille Patterns
// Supplemental Arrows-B
// Miscellaneous Mathematical Symbols-B
// Supplemental Mathematical Operators
// Miscellaneous Symbols and Arrows
'\\u2000-\\u2BFF',
// Supplemental Punctuation
'\\u2E00-\\u2E7F',
']'
].join( '' ), 'g' ),
Change History (1)
Note: See
TracTickets for help on using
tickets.
I think you're mixing the character class shortcuts
\w,\d,\s, etc. with the UTF character escape sequences\u####(where #### are four hexadecimal digits). Also, note that the UTF chars are in an array that is joined before used as a string in the regex.