Context Navigation

← Previous Ticket
Next Ticket →

#29256 closed defect (bug) (fixed)

Strings ending with a number and quotation mark get wrong smart quotes

Reported by:	coreygilmore	Owned by:	johnbillion
Milestone:	4.3	Priority:	normal
Severity:	normal	Version:	2.8
Component:	Formatting	Keywords:	wptexturize has-patch
Focuses:		Cc:

Description

When a string ends in a number followed by a single or double quote, such as "Lorem ipsum dolor sit amet 1234" or 'Etiam eu egestas dui 1234' wptexturize incorrectly converts the trailing single or double quote into a single or double prime, respectively.

Related to #8775, but the scope of that ticket seems to be limited to numbers inside of quotes, not a number preceded by other text (crudely described as\d+ vs .*\d+).

This is still an issue in trunk on .org and on wordpress.com (Example: http://coreygilmore.wordpress.com/2014/08/18/expendables-3-illegally-downloaded-5-million-times-but-still-isnt-top-hit-for-pirates/)

Attachments (6)

quote-prime.png (62.1 KB) - added by coreygilmore 11 years ago.
miqro-29256.patch (13.2 KB) - added by miqrogroove 11 years ago.
miqro-29256.2.patch (13.2 KB) - added by miqrogroove 11 years ago.: Tweaks for readability.
miqro-29256.3.patch (13.3 KB) - added by miqrogroove 10 years ago.: Refreshed
miqro-29256.4.patch (18.3 KB) - added by miqrogroove 10 years ago.: Handles situations where open and close quotes are identical after translation.
miqro-29256.5.patch (18.0 KB) - added by miqrogroove 10 years ago.: Refresh previous patch.

Download all attachments as: .zip

Change History (33)

@coreygilmore
11 years ago

Attachment quote-prime.png added

#1 @miqrogroove
11 years ago

Keywords needs-patch needs-unit-tests wptexturize added

Ignoring #18549 it may be possible to fix this for plain text only. The logic is not "regular" and will require potentially slow code.

#2 @miqrogroove
11 years ago

Algorithm brainstorm:

Split the subject string at every opening quote.
If each value contains only one closing quote, then no primes exist.
If a value contains multiple closing quote candidates then any candidate that is not preceded by a digit should be replaced with a closing quote before replacing the primes.
Make sure primes are followed by a space, whereas trailing punctuation indicates a quote.
If all closing quotes are preceded by a digit and not followed by punctuation then the last quote is assumed to be the only closing quote character.
Implode the subject string and resume the inner loop of wptexturize.

#3 @miqrogroove
11 years ago

On further reflection, primes do not have to be followed by whitespace. For example, the length of the table is 4'.

@miqrogroove
11 years ago

Attachment miqro-29256.patch added

#4 @miqrogroove
11 years ago

Keywords has-patch added; needs-patch needs-unit-tests removed

A complete solution with unit tests is now attached.

Some benchmark testing wouldn't hurt, just to find out the impact.

I will advocate for the 4.1 milestone when it becomes available.

@miqrogroove
11 years ago

Attachment miqro-29256.2.patch added

Tweaks for readability.

#5 @miqrogroove
11 years ago

Note the patch and tests do not check for mix-and-match usage of UTF-8 quote chars. If that's desired, we would have to normalize those chars as entity references at the top of the new function.

#6 @miqrogroove
10 years ago

Keywords 4.1-early added

#7 @SergeyBiryukov
10 years ago

Milestone changed from Awaiting Review to 4.1

#8 @miqrogroove
10 years ago

A few more thoughts:

Patch will need to be refreshed after 4.0.1.

Patch did not add any new tests for cases like Fred answered the problem with '99' and then turned in the test. It would be helpful to look into whether the complexity would be increased or decreased by bringing in logic for quoted 2-digit numbers. This is the situation where we have to distinguish between an abbreviated year at the end of a quotation vs. a standalone quoted 2-digit number. If we can remove those extra regex patterns from the main function and incorporate them here, it might give better and faster results.

@miqrogroove
10 years ago

Attachment miqro-29256.3.patch added

Refreshed

#9 @miqrogroove
10 years ago

Refreshed patch and looked at remaining issues. I think this patch will stand on its own because it solves a specific problem without being too complex.

The remaining problems we still have with a phrase like '99' is that it could have three different meanings depending on its context. Examples:

Then he said, '99' of lumber'.

Then he said, "we need '99' feet of lumber".

Then he said, 'that was the class of '99'.

We could theoretically determine that context, but it would require a huge amount of new code that seems to be beyond the scope of this ticket.

#10 @miqrogroove
10 years ago

#29912 was marked as a duplicate.

@miqrogroove
10 years ago

Attachment miqro-29256.4.patch added

Handles situations where open and close quotes are identical after translation.

#11 @miqrogroove
10 years ago

Expanded the patch so that if opening and closing quotes are both the same character after translation, then the new function won't get confused or need to bail.

#12 follow-up: ↓ 13 @paulschreiber
10 years ago

Does this patch handle the case of "30 for 30" ?

With 4.0, the second " becomes an inch mark.

#13 in reply to: ↑ 12 @miqrogroove
10 years ago

Replying to paulschreiber:

Does this patch handle the case of "30 for 30" ?

Yes.

#14 @johnbillion
10 years ago

Keywords 4.2-early added; 4.1-early removed
Milestone changed from 4.1 to Future Release

This could do with some performance figures. How does this impact page generation time?

#15 @paulschreiber
10 years ago

We'd really like to see this fixed in 4.1 — it is causing problems for our editorial staff.

I noticed this has been patched for two months. Please don't kick it to the next release and suddenly worry about page generation time. (I've never seen a wptexturize fix require "performance figures.") If this is a genuine concern, give @miqrogroove some time to do so before deferring it.

#16 @johnbillion
10 years ago

This is a large change to texturize and we're on the verge of a RC. Sorry it wasn't addressed sooner.

#17 @miqrogroove
10 years ago

Future Release is correct. I have been focused on critical bugs in 4.1 and expect that work to continue for a week or two.

#18 @iseulde
10 years ago

Milestone changed from Future Release to 4.2

has-patch 4.2-early so moving to 4.2.

#19 @DrewAPicture
10 years ago

@miqrogroove: What's the status here? Did you still want to try to tackle this for 4.2?

Patch still applies.

#20 @DrewAPicture
10 years ago

Keywords 4.3-early added; 4.2-early removed
Milestone changed from 4.2 to Future Release

No meaningful activity in several months, and as cited by @johnbillion previously, this needs time to soak. Maybe 4.3-early is the ticket.

#21 @miqrogroove
10 years ago

Needs performance testing at the least. Didn't have time to work on it this cycle. :)

#22 @obenland
10 years ago

Owner set to johnbillion
Status changed from new to assigned

#23 @obenland
10 years ago

Keywords 4.3-early removed
Milestone changed from Future Release to 4.3

#24 @miqrogroove
10 years ago

Focuses performance added
Keywords changed from wptexturize, has-patch to wptexturize has-patch

Initial performance test showed wptexturize() runs 20% faster after the patch. :o Wow. It may be ugly, but it's fast! I can only guess the improvement is tied to the overall reduction of PREG usage. I will test more to confirm this.

#25 @miqrogroove
10 years ago

Focuses performance removed

Performance difference was < 1% after correcting a script error. Still, I was expecting awful results so this is promising.

@miqrogroove
10 years ago

Attachment miqro-29256.5.patch added

Refresh previous patch.

#26 @wonderboymusic
10 years ago

Resolution set to fixed
Status changed from assigned to closed

In 32863:

wptexturize() improvements:

Make sure that strings ending with a number and quotation mark get the proper smart quotes
Introduce wptexturize_primes(), a logic tree to determine whether or not "7'." represents seven feet, then converts the special char into either a prime char or a closing quote char.

Adds unit tests.

Props miqrogroove.
Fixes #29256.

#27 @xibe
10 years ago

Great to see this finally land in Core! It solves part of the unending issue between wptexturize and smart quotes :)

Note: See TracTickets for help on using tickets.

Trac UI Preferences

Download in other formats:

Make WordPress Core

Context Navigation

#29256 closed defect (bug) (fixed)

Strings ending with a number and quotation mark get wrong smart quotes

Description

Attachments (6)

Change History (33)

@coreygilmore 11 years ago

#1 @miqrogroove 11 years ago

#2 @miqrogroove 11 years ago

#3 @miqrogroove 11 years ago

@miqrogroove 11 years ago

#4 @miqrogroove 11 years ago

@miqrogroove 11 years ago

#5 @miqrogroove 11 years ago

#6 @miqrogroove 10 years ago

#7 @SergeyBiryukov 10 years ago

#8 @miqrogroove 10 years ago

@miqrogroove 10 years ago

#9 @miqrogroove 10 years ago

#10 @miqrogroove 10 years ago

@miqrogroove 10 years ago

#11 @miqrogroove 10 years ago

#12 follow-up: ↓ 13 @paulschreiber 10 years ago

#13 in reply to: ↑ 12 @miqrogroove 10 years ago

#14 @johnbillion 10 years ago

#15 @paulschreiber 10 years ago

#16 @johnbillion 10 years ago

#17 @miqrogroove 10 years ago

#18 @iseulde 10 years ago

#19 @DrewAPicture 10 years ago

#20 @DrewAPicture 10 years ago

#21 @miqrogroove 10 years ago

#22 @obenland 10 years ago

#23 @obenland 10 years ago

#24 @miqrogroove 10 years ago

#25 @miqrogroove 10 years ago

@miqrogroove 10 years ago

#26 @wonderboymusic 10 years ago

#27 @xibe 10 years ago

Download in other formats:

@coreygilmore
11 years ago

#1 @miqrogroove
11 years ago

#2 @miqrogroove
11 years ago

#3 @miqrogroove
11 years ago

@miqrogroove
11 years ago

#4 @miqrogroove
11 years ago

@miqrogroove
11 years ago

#5 @miqrogroove
11 years ago

#6 @miqrogroove
10 years ago

#7 @SergeyBiryukov
10 years ago

#8 @miqrogroove
10 years ago

@miqrogroove
10 years ago

#9 @miqrogroove
10 years ago

#10 @miqrogroove
10 years ago

@miqrogroove
10 years ago

#11 @miqrogroove
10 years ago

#12 follow-up: ↓ 13 @paulschreiber
10 years ago

#13 in reply to: ↑ 12 @miqrogroove
10 years ago

#14 @johnbillion
10 years ago

#15 @paulschreiber
10 years ago

#16 @johnbillion
10 years ago

#17 @miqrogroove
10 years ago

#18 @iseulde
10 years ago

#19 @DrewAPicture
10 years ago

#20 @DrewAPicture
10 years ago

#21 @miqrogroove
10 years ago

#22 @obenland
10 years ago

#23 @obenland
10 years ago

#24 @miqrogroove
10 years ago

#25 @miqrogroove
10 years ago

@miqrogroove
10 years ago

#26 @wonderboymusic
10 years ago

#27 @xibe
10 years ago