WordPress.org

Make WordPress Core

Opened 2 years ago

Closed 2 years ago

#40817 closed defect (bug) (invalid)

WordCounter removeRegExp maybe broken

Reported by: DrLightman Owned by:
Milestone: Priority: normal
Severity: normal Version: 4.6.4
Component: Editor Keywords:
Focuses: javascript, administration Cc:
PR Number:

Description

In the file \wp-admin\js\word-count.js at around line 27 of my WP 4.6.6 there is the removeRegExp for WordCounter.prototype.settings:

removeRegExp: new RegExp( [
    '[',
        // Basic Latin (extract)
        '\u0021-\u0040\u005B-\u0060\u007B-\u007E',
        // Latin-1 Supplement (extract)
        '\u0080-\u00BF\u00D7\u00F7',
        // General Punctuation
        // Superscripts and Subscripts
        // Currency Symbols
        // Combining Diacritical Marks for Symbols
        // Letterlike Symbols
        // Number Forms
        // Arrows
        // Mathematical Operators
        // Miscellaneous Technical
        // Control Pictures
        // Optical Character Recognition
        // Enclosed Alphanumerics
        // Box Drawing
        // Block Elements
        // Geometric Shapes
        // Miscellaneous Symbols
        // Dingbats
        // Miscellaneous Mathematical Symbols-A
        // Supplemental Arrows-A
        // Braille Patterns
        // Supplemental Arrows-B
        // Miscellaneous Mathematical Symbols-B
        // Supplemental Mathematical Operators
        // Miscellaneous Symbols and Arrows
        '\u2000-\u2BFF',
        // Supplemental Punctuation
        '\u2E00-\u2E7F',
    ']'
].join( '' ), 'g' ),

But according to Javascript docs https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp when using string notation the backslashes should be escaped:

When using the constructor function, the normal string escape rules (preceding special characters with \ when included in a string) are necessary. For example, the following are equivalent:
var re = /\w+/;
var re = new RegExp('\\w+');

So shouldn't be this the correct way to build that regexp since it uses the second way with the string, with \\u in place of \u?

removeRegExp: new RegExp( [
    '[',
        // Basic Latin (extract)
        '\\u0021-\\u0040\\u005B-\\u0060\\u007B-\\u007E',
        // Latin-1 Supplement (extract)
        '\\u0080-\\u00BF\\u00D7\\u00F7',
        // General Punctuation
        // Superscripts and Subscripts
        // Currency Symbols
        // Combining Diacritical Marks for Symbols
        // Letterlike Symbols
        // Number Forms
        // Arrows
        // Mathematical Operators
        // Miscellaneous Technical
        // Control Pictures
        // Optical Character Recognition
        // Enclosed Alphanumerics
        // Box Drawing
        // Block Elements
        // Geometric Shapes
        // Miscellaneous Symbols
        // Dingbats
        // Miscellaneous Mathematical Symbols-A
        // Supplemental Arrows-A
        // Braille Patterns
        // Supplemental Arrows-B
        // Miscellaneous Mathematical Symbols-B
        // Supplemental Mathematical Operators
        // Miscellaneous Symbols and Arrows
        '\\u2000-\\u2BFF',
        // Supplemental Punctuation
        '\\u2E00-\\u2E7F',
    ']'
].join( '' ), 'g' ),

Change History (1)

#1 @azaozz
2 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to invalid
  • Status changed from new to closed

I think you're mixing the character class shortcuts \w, \d, \s, etc. with the UTF character escape sequences \u#### (where #### are four hexadecimal digits). Also, note that the UTF chars are in an array that is joined before used as a string in the regex.

Note: See TracTickets for help on using tickets.