#29256 closed defect (bug) (fixed)
Strings ending with a number and quotation mark get wrong smart quotes
Reported by: |
|
Owned by: |
|
---|---|---|---|
Milestone: | 4.3 | Priority: | normal |
Severity: | normal | Version: | 2.8 |
Component: | Formatting | Keywords: | wptexturize has-patch |
Focuses: | Cc: |
Description
When a string ends in a number followed by a single or double quote, such as "Lorem ipsum dolor sit amet 1234"
or 'Etiam eu egestas dui 1234'
wptexturize incorrectly converts the trailing single or double quote into a single or double prime, respectively.
Related to #8775, but the scope of that ticket seems to be limited to numbers inside of quotes, not a number preceded by other text (crudely described as\d+
vs .*\d+
).
This is still an issue in trunk on .org and on wordpress.com (Example: http://coreygilmore.wordpress.com/2014/08/18/expendables-3-illegally-downloaded-5-million-times-but-still-isnt-top-hit-for-pirates/)
Attachments (6)
Change History (33)
#2
@
11 years ago
Algorithm brainstorm:
- Split the subject string at every opening quote.
- If each value contains only one closing quote, then no primes exist.
- If a value contains multiple closing quote candidates then any candidate that is not preceded by a digit should be replaced with a closing quote before replacing the primes.
- Make sure primes are followed by a space, whereas trailing punctuation indicates a quote.
- If all closing quotes are preceded by a digit and not followed by punctuation then the last quote is assumed to be the only closing quote character.
- Implode the subject string and resume the inner loop of wptexturize.
#3
@
11 years ago
On further reflection, primes do not have to be followed by whitespace. For example, the length of the table is 4'.
#4
@
11 years ago
- Keywords has-patch added; needs-patch needs-unit-tests removed
A complete solution with unit tests is now attached.
Some benchmark testing wouldn't hurt, just to find out the impact.
I will advocate for the 4.1 milestone when it becomes available.
#5
@
11 years ago
Note the patch and tests do not check for mix-and-match usage of UTF-8 quote chars. If that's desired, we would have to normalize those chars as entity references at the top of the new function.
#8
@
10 years ago
A few more thoughts:
Patch will need to be refreshed after 4.0.1.
Patch did not add any new tests for cases like Fred answered the problem with '99' and then turned in the test.
It would be helpful to look into whether the complexity would be increased or decreased by bringing in logic for quoted 2-digit numbers. This is the situation where we have to distinguish between an abbreviated year at the end of a quotation vs. a standalone quoted 2-digit number. If we can remove those extra regex patterns from the main function and incorporate them here, it might give better and faster results.
#9
@
10 years ago
Refreshed patch and looked at remaining issues. I think this patch will stand on its own because it solves a specific problem without being too complex.
The remaining problems we still have with a phrase like '99'
is that it could have three different meanings depending on its context. Examples:
Then he said, '99' of lumber'.
Then he said, "we need '99' feet of lumber".
Then he said, 'that was the class of '99'.
We could theoretically determine that context, but it would require a huge amount of new code that seems to be beyond the scope of this ticket.
#11
@
10 years ago
Expanded the patch so that if opening and closing quotes are both the same character after translation, then the new function won't get confused or need to bail.
#12
follow-up:
↓ 13
@
10 years ago
Does this patch handle the case of "30 for 30"
?
With 4.0, the second " becomes an inch mark.
#14
@
10 years ago
- Keywords 4.2-early added; 4.1-early removed
- Milestone changed from 4.1 to Future Release
This could do with some performance figures. How does this impact page generation time?
#15
@
10 years ago
We'd really like to see this fixed in 4.1 — it is causing problems for our editorial staff.
I noticed this has been patched for two months. Please don't kick it to the next release and suddenly worry about page generation time. (I've never seen a wptexturize fix require "performance figures.") If this is a genuine concern, give @miqrogroove some time to do so before deferring it.
#16
@
10 years ago
This is a large change to texturize and we're on the verge of a RC. Sorry it wasn't addressed sooner.
#17
@
10 years ago
Future Release is correct. I have been focused on critical bugs in 4.1 and expect that work to continue for a week or two.
#18
@
10 years ago
- Milestone changed from Future Release to 4.2
has-patch 4.2-early so moving to 4.2.
#19
@
10 years ago
@miqrogroove: What's the status here? Did you still want to try to tackle this for 4.2?
Patch still applies.
#20
@
10 years ago
- Keywords 4.3-early added; 4.2-early removed
- Milestone changed from 4.2 to Future Release
No meaningful activity in several months, and as cited by @johnbillion previously, this needs time to soak. Maybe 4.3-early is the ticket.
#21
@
10 years ago
Needs performance testing at the least. Didn't have time to work on it this cycle. :)
#24
@
10 years ago
- Focuses performance added
- Keywords changed from wptexturize, has-patch to wptexturize has-patch
Initial performance test showed wptexturize() runs 20% faster after the patch. :o Wow. It may be ugly, but it's fast! I can only guess the improvement is tied to the overall reduction of PREG usage. I will test more to confirm this.
#25
@
10 years ago
- Focuses performance removed
Performance difference was < 1% after correcting a script error. Still, I was expecting awful results so this is promising.
#27
@
10 years ago
Great to see this finally land in Core! It solves part of the unending issue between wptexturize and smart quotes :)
Ignoring #18549 it may be possible to fix this for plain text only. The logic is not "regular" and will require potentially slow code.