WordPress.org

Make WordPress Core

Opened 2 years ago

Closed 2 years ago

#24001 closed defect (bug) (fixed)

\s in the regexp destroys some UTF-8 characters in pingback_ping()

Reported by: tenpura Owned by: SergeyBiryukov
Milestone: 3.6 Priority: normal
Severity: normal Version: 1.5.2
Component: XML-RPC Keywords: has-patch commit
Focuses: Cc:

Description

\s in the regexp destroys some UTF-8 characters in pingback_ping(). Same issue as in #21625.

Steps to reproduce:

  1. Pingback with the post title "САПР".
  2. It will create a pingback comment with no comment author (Anonymous).

Solution:

Use [\r\n\t ] rather than [\s\r\n\t].

Attachments (1)

24001.diff (704 bytes) - added by tenpura 2 years ago.

Download all attachments as: .zip

Change History (6)

@tenpura2 years ago

comment:1 @SergeyBiryukov2 years ago

  • Keywords commit added
  • Milestone changed from Awaiting Review to 3.6
  • Version changed from trunk to 1.5.2

comment:2 @SergeyBiryukov2 years ago

Introduced in [2619].

comment:3 follow-up: @azaozz2 years ago

Yeah, we shouldn't be using \s in regex that filters user submitted or translatable text as it matches bytes that are part of some multibyte (UTF-8, others) chars. The [\r\n\t ] replacement seems to work properly. In theory best would be to use the u modifier, however that leaves installs with charset other than UTF-8 out in a "grey area".

comment:4 in reply to: ↑ 3 @tenpura2 years ago

Replying to azaozz:

In theory best would be to use the u modifier, however that leaves installs with charset other than UTF-8 out in a "grey area".

It seems that PCRE is not always compiled with UTF-8 support. That is another drawback of using the u modifier.

comment:5 @SergeyBiryukov2 years ago

  • Owner set to SergeyBiryukov
  • Resolution set to fixed
  • Status changed from new to closed

In 23952:

Remove \s from regex in pingback_ping() to avoid UTF-8 issues. props tenpura. fixes #24001.

Note: See TracTickets for help on using tickets.