Make WordPress Core

Opened 3 years ago

Last modified 19 months ago

#53019 new defect (bug)

The _sanitize_text_fields function removing the octets that incorrectly work with Arabic RTL languages.

Reported by: wppunk's profile wppunk Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: Security Keywords:
Focuses: rtl Cc:

Description (last modified by SergeyBiryukov)

%10 - %99 are valid percents for the Arabic languages. The Arabic percentage usage.

As I can see [here]https://core.trac.wordpress.org/browser/tags/5.7/src/wp-includes/formatting.php#L5409, removing all octets, but I'm not sure that it's really for security reasons. Anyone could approve that this code really important here?

Change History (4)

#1 @SergeyBiryukov
3 years ago

  • Description modified (diff)

#2 follow-up: @peterwilsoncc
3 years ago

I ran some strings through various escaping functions in wp-cli

wp> sanitize_text_field( 'cats of %90 by recommended' );
string(22) "cats of by recommended"

wp> sanitize_text_field( 'recommend by 90% of cats' );
string(24) "recommend by 90% of cats"

wp> sanitize_text_field( 'cats of %900 by recommended' );
string(24) "cats of 0 by recommended"

wp> sanitize_text_field( 'cats of 💯 by recommended' );
string(27) "cats of 💯 by recommended"

wp> sanitize_text_field( 'cats of %90 < by recommended' );
string(27) "cats of &lt; by recommended"

wp> esc_attr( 'cats of %90 by recommended' );
string(26) "cats of %90 by recommended"

wp> esc_url( 'http://example.com/?s=20%' )
string(25) "http://example.com/?s=20%"

wp> esc_url( 'http://example.com/?s=20%25' )
string(27) "http://example.com/?s=20%25"

wp> esc_url( 'http://example.com/?s=%20' )
string(25) "http://example.com/?s=%20"

wp> esc_attr( 'cats of %90 by recommended' );
string(26) "cats of %90 by recommended"

wp> global $wpdb
wp> $wpdb->prepare( 'post_type=%s', '%20' )
string(80) "post_type='{ad6df8669b87f3e7ce3f7b30446aeb270ddef911039b7c96abdd4e90e383dfe5}20'"

As a general rule the sanitize_* functions are intended to run on data on the way in, the esc_* function upon display so some difference is expected but WP should certainly accommodate RTL languages.

It occurs to me that in faux-equations something like %aa + %bb = %cc could also be legitimate in some RTL languages.

This was added in [11929] for #10751 but the reasoning is unclear.

--

WordPress ought to support RTL representations of percentages. For properly prepared SQL statements, WP uses the value of $wpdb->placeholder_escape() for percent symbols and later removes them while making the query.

#3 in reply to: ↑ 2 @SergeyBiryukov
2 years ago

Related: #31777

#4 @erengy
19 months ago

Bumped into this when a customer was unable to edit the discount label of a product. The text was in Turkish, which is written left to right. The issue may affect two other LTR languages (Basque and Kurdish) as well.

Using the following Python script on https://github.com/unicode-org/cldr-json:

import json
import os

for locale in os.listdir('./cldr-numbers-full/main/'):
  with open(f'./main/{locale}/numbers.json', encoding='utf-8') as file:
    data = json.load(file)
    numbers = data['main'][locale]['numbers']
    percent_format = numbers['percentFormats-numberSystem-latn']['standard']
    if percent_format.startswith('%'):
      print(f'{locale:<10}{percent_format}')

Output:

eu        % #,##0
ku        %#,##0
tr        %#,##0
tr-CY     %#,##0
Note: See TracTickets for help on using tickets.