Make WordPress Core

Opened 3 years ago

Last modified 4 weeks ago

#18549 reopened defect (bug)

wp_texturize incorrectly curls closing quotes after inline HTML end tags

Reported by: justincwatt Owned by: wonderboymusic
Milestone: Future Release Priority: normal
Severity: normal Version: 3.2.1
Component: Formatting Keywords: wptexturize needs-patch needs-unit-tests
Focuses: Cc:


The following source HTML:

The word is "<a href="http://example.com/">quoted</a>".
The word is '<a href="http://example.com/">quoted</a>'
The word is '<a href="http://example.com/">quoted.</a>'
The word is '<a href="http://example.com/">quoted</a>'.
The word is '<a href="http://example.com/">quot</a>'d

Gets incorrectly transformed by wp_texturize() as:

The word is &#8220;<a href="http://example.com/">quoted</a>&#8220;.
The word is &#8216;<a href="http://example.com/">quoted</a>&#8216;
The word is &#8216;<a href="http://example.com/">quoted.</a>&#8216;
The word is &#8216;<a href="http://example.com/">quoted</a>&#8216;.
The word is &#8216;<a href="http://example.com/">quot</a>&#8216;d

Note: all the double/single quotes in the above examples that should be closing are instead opening)

This renders in the browser like this:

The word is “quoted“.
The word is ‘quoted‘
The word is ‘quoted.‘
The word is ‘quoted‘.
The word is ‘quot‘d

The problem here is that wp_texturize splits the text on all start/end tags, which makes sense for block-level tags, but not inline-tags like <em> and <a href="">.

formatting.php line 67:

$textarr = preg_split('/(<.*>|\[.*\])/Us', $text, -1, PREG_SPLIT_DELIM_CAPTURE);

However if you change it to only split the content on block-level tags, you'll need something more sophisticated/complex than a regular expression to avoid curling quotes within html.

Attachments (2)

18549.diff (6.3 KB) - added by wonderboymusic 4 weeks ago.
18549.2.diff (7.7 KB) - added by miqrogroove 4 weeks ago.
Tweak the unit tests.

Download all attachments as: .zip

Change History (12)

comment:2 Ammaletu2 years ago

  • Cc Ammaletu added
  • Version changed from 3.2.1 to 3.3.1

I can confirm this issue. Haven't had time yet to write a test case or patch, but I took a look at the plugin "InTypo" (which I have been using for years to change the English quote characters to German ones). This plugin completely replaces wptexturize and does not show this bug except in one minor case. What InTypo does: It first replaces the closing quotes and then the opening ones. This way, when you get to replacing the opening quotes, the closing quotes are already safely replaced.

The code looks like this:

$dynamic_character = '/"([\.,;\!\?\s\)\]])/'
$dynamic_replacement = $closing_quote . '$1'

Basically, it replaces every quote character with a closing quote that is followed by a space, a closing bracket or a number of punctuation marks. That is not perfect yet, for example wptexturize handles { and < for opening quotes, so these should be handled here as well. Also probably any punctuation marks from different languages (are there any missing?). And finally this fails if the closing quote is the very last character of the post, a case that could be handled by adding \Z.

Could this be the basis for a patch?

comment:3 SergeyBiryukov2 years ago

  • Version changed from 3.3.1 to 3.2.1

Version number indicates when the bug was initially introduced/reported.

comment:4 SergeyBiryukov16 months ago

#23827 was marked as a duplicate.

comment:5 nacin6 months ago

  • Keywords wptexturize added

comment:6 nacin6 months ago

  • Keywords needs-patch needs-unit-tests added
  • Milestone changed from Awaiting Review to Future Release

Unit tests in particular would move this along.

comment:7 miqrogroove4 months ago

All of this seems to fall within the scope of #4539, which already has unit tests. Can we close this one as a duplicate?

comment:8 ircbot4 weeks ago

This ticket was mentioned in IRC in #wordpress-dev by miqrogroove. View the logs.

wonderboymusic4 weeks ago

miqrogroove4 weeks ago

Tweak the unit tests.

comment:9 wonderboymusic4 weeks ago

  • Owner set to wonderboymusic
  • Resolution set to fixed
  • Status changed from new to closed

In 28764:

In wptexturize(), adjust for the treatment of abbreviated years at the end of quotations.
Silence some unit tests that have never passed and may no longer be applicable.

Props miqrogroove.
Fixes #18549.

comment:10 miqrogroove4 weeks ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Ticket still valid. Just didn't need the broken tests.

Note: See TracTickets for help on using tickets.