Make WordPress Core

Opened 11 years ago

Closed 10 years ago

Last modified 10 years ago

#26850 closed defect (bug) (fixed)

Single quotes show up as apostrophes when appearing before numbers

Reported by: yurivictor's profile yurivictor Owned by: wonderboymusic's profile wonderboymusic
Milestone: 4.0 Priority: normal
Severity: minor Version: 3.9
Component: Formatting Keywords: wptexturize has-patch
Focuses: Cc:

Description

Here's an example:

http://www.washingtonpost.com/blogs/fact-checker/wp/2013/09/24/has-obama-cut-the-budget-deficit-in-half/

Also happening on .org trunk and wordpress.com:

http://yurivictor.wordpress.com/2014/01/16/testings-4-through-quotes/

Headlines were manually typed into the WordPress admin, not copy and pasted.

Attachments (9)

26850.diff (600 bytes) - added by yurivictor 11 years ago.
26850.2.diff (610 bytes) - added by yurivictor 11 years ago.
Update fixes numbers greater than 99
26850.3.diff (600 bytes) - added by yurivictor 11 years ago.
per nacin
26850.4.diff (639 bytes) - added by miqrogroove 10 years ago.
Refreshed
26850.5.diff (641 bytes) - added by miqrogroove 10 years ago.
Alternation within lookahead is capturing? Who knew.
26850.6.diff (1.1 KB) - added by miqrogroove 10 years ago.
Exclude decimals. Increase priority to favor possessive abbr. vs. quoted numbers.
miqro-26850.patch (3.4 KB) - added by miqrogroove 10 years ago.
Fix '9 and '999. Fix '99' to avoid primes. Add unit tests.
miqro-26850-part2.patch (2.5 KB) - added by miqrogroove 10 years ago.
Fix '99% and adjust unit tests.
miqro-26850-part2.2.patch (2.7 KB) - added by miqrogroove 10 years ago.

Download all attachments as: .zip

Change History (34)

#1 @Kenshino
11 years ago

  • Keywords needs-patch added

Hi,

Confirmed happening on fresh install of WP 3.8

This seems to be the behaviour of the_title() as the back end input box prints fine.

I've also tested it on a WP 3.5.1 installation.

Similar behaviour, except that with the 3.5.1, the first quote gets converted to apostrophe, the 2nd does not. The backend input box is also affected in the same way.

#2 @nacin
11 years ago

  • Keywords wptexturize added

#3 @nacin
11 years ago

  • Keywords needs-unit-tests added
  • Milestone changed from Awaiting Review to Future Release
  • Summary changed from Single quotes show up as apostrophes when appearing before numbers in the_title() to Single quotes show up as apostrophes when appearing before numbers

This isn't the_title() only; it's wptexturize().

The rule is designed for years. So, '99 or '99's gets rendered with an apostrophe preceeding the 99. In this case, it isn't a year, it's only one digit. So that should handle at least 0-9 and 100+. What about 10-99? In this case, it's the start of a quote. If we can manage to process that there is a closing quote as well, versus just a standalone number, it's probably a better bet to assume that we're dealing with a quote rather than a year, which is much more rare.

This some needs unit tests. Also is possibly a duplicate of another wptexturize() ticket (most of which also need unit tests).

#4 @helen
11 years ago

'90's is poor form anyway, if referring to years; should be '90s. So, yes, assuming that the existence of a closing quote means it's a quote as opposed to a truncated number is a good idea.

@yurivictor
11 years ago

#5 @yurivictor
11 years ago

Made a dirty patch 26850.diff.

Forces regex to search for two digits in a row, rather than any digit.

e.g. Finds '99 as a year, but not '9.

The rest will go through the normal wptexturize flow, which appropriately styles the apostrophes. Doesn't solve all use cases, but it's a start.

#6 @nacin
11 years ago

Good first step, I agree. We'll want some unit tests for this, to verify that it no longer messes with '9 or '999. (It looks like '999 will still fail here.)

I tend to agree with Helen, '99's is poor form. But there are some possessive considerations. "1999's introduction of the Euro" becomes "'99's introduction of the Euro". Not that a year possessing something is good form.

An aside, I enjoyed searching around for some writings about the direction and treatment of these apostrophes, and I was happy that the first three results I clicked were WordPress blogs that, in the text, had examples correct. Meanwhile, one post was a 500-word missive on how to get Microsoft Word to do this without screwing it up.

@yurivictor
11 years ago

Update fixes numbers greater than 99

#7 @yurivictor
11 years ago

  • Keywords has-patch added; needs-patch removed

Latest patch 26850.2.diff fixes apostrophes for all numbers 0-9 and greater than 99.

So '9 and '999 will both show up correctly, while '99 will still convert to year.

Uses space or end of line to stop after two digits.

#8 @nacin
11 years ago

A negative lookahead might be better than a positive one: /\'(\d\d)(?!\d)/. Otherwise '99. won't be caught. Or, in lieu of \z and \s, using a word boundary should be sufficient here.

@yurivictor
11 years ago

per nacin

#9 @yurivictor
11 years ago

Good point. Updated to use a negative lookahead 26850.3.diff.

#10 @miqrogroove
11 years ago

I'm really confused by this ticket. We have an ambiguous case where the '\d pattern could mean different things.

The solution seems to be a hack for the phrase '4 years, 3 months,' which would fail predictably if the phrase were '40 years, 3 months,'

How do we reconcile those patterns? What exactly is the expected output?

#11 @miqrogroove
10 years ago

  • Keywords close added

I would like to close this as invalid or wontfix. We have an ambiguous case where a decision has already been made to favor abbreviated year numbers. There is nothing to fix here.

#12 @yurivictor
10 years ago

@miqrogroove, I'll see if I can explain better.

Quote marks show up in the wrong direction when used around numbers that aren't years.

Here's an example post where this happens:

https://yurivictor.wordpress.com/2014/01/16/testings-4-through-quotes/

Notice the headline. 4 is not a year, but it is treated as a year which is why as you may see the quote mark points in the wrong direction. The patch would prevent that from happening in 99.9% of cases. It's not a complete solution, but it would definitely fix the above use case and the problem that The Washington Post was having.

If you have a better solution, I would love to hear it, but this is definitely a problem that needs to be solved. The answer isn't obvious, but should at least be discussed.

#13 @miqrogroove
10 years ago

Yeah I got that part. And as I pointed out above, a single apostrophe before a number is syntactically identical and ambiguous between the year abbreviation and the beginning of a quoted number. How do you propose to distinguish between dates and quotes?

#14 @yurivictor
10 years ago

Right. It's a tough bug.

The code in core currently assumes all apostrophes followed by a number are dates and changes the apostrophe accordingly.

Case 1: `4 is styled like a year, but it's never a year
Case 2: `44 is styled like a year, which it might be a year or the start of a quote
Case 3: `444 is styled like a year, but it's never a year

The patch solves case 1 and case 3. I have no idea how to solve case 2.

If someone comes up with a solution for case 2, I'd love to hear it because smart quotes are kind of a nightmare on almost every platform.

Smart quotes make for strange code.

#15 @miqrogroove
10 years ago

Is the apos-before-digit pattern supposed to never match unless there are exactly 2 digits? The pattern isn't written that way, and it would be a trivial adjustment. If that's all we're talking about here, that can be fixed.

If we want to distinguis between quotes and apostrophes for the 2 digits, that's going to be a whole other can of worms.

@miqrogroove
10 years ago

Refreshed

@miqrogroove
10 years ago

Alternation within lookahead is capturing? Who knew.

#16 @miqrogroove
10 years ago

Regarding comments 3 & 4 above, here is why a closing quote is not algorithmically helpful:

Then she said, 'I went to school in '99 but dropped out.'

vs.

Back in '99 she said, '40 years ago I went to school but dropped out.'

Both sentences have closing quotes, but usage of abbreviations remains ambiguous in simple patterns.

@miqrogroove
10 years ago

Exclude decimals. Increase priority to favor possessive abbr. vs. quoted numbers.

#17 @miqrogroove
10 years ago

  • Keywords close removed

#18 @miqrogroove
10 years ago

26850.6.diff would also fix a concern from ticket #8775. If we want this pattern to work, it needs to be at the top of the pattern list again.

@miqrogroove
10 years ago

Fix '9 and '999. Fix '99' to avoid primes. Add unit tests.

#19 @miqrogroove
10 years ago

  • Keywords needs-unit-tests removed

In miqro-26850.patch:

  • Only place an apostrophe before a number when it has exactly two digits.
  • Never match '99' with the single prime pattern.
  • Always assume '99' is an abbreviated year at the end of a quotation.
  • Both test cases in the ticket description are resolved.
  • Appropriate unit tests added.
  • Resolves the unit test broken in [28721] for #8775.
  • Does not fix any part of #27426.

#20 @SergeyBiryukov
10 years ago

  • Milestone changed from Future Release to 4.0

#21 @wonderboymusic
10 years ago

In 28761:

wptexturize() adjustments:

  • Only place an apostrophe before a number when it has exactly two digits.
  • Never match '99' with the single prime pattern.
  • Always assume '99' is an abbreviated year at the end of a quotation.
  • Add unit tests.
  • Resolves the unit test broken in [28721] for #8775.

See #26850.

#22 @miqrogroove
10 years ago

Needs another tweak to handle '99% of people' which is not an abbreviation.

@miqrogroove
10 years ago

Fix '99% and adjust unit tests.

#23 @wonderboymusic
10 years ago

  • Owner set to wonderboymusic
  • Resolution set to fixed
  • Status changed from new to closed

In 28765:

Fix abbreviations mixed with quotes, example: '99% of people'.
Add/alter unit tests.

Props miqrogroove.
Fixes #26850.

#24 @miqrogroove
10 years ago

Single-quoted phrases beginning with exactly two digits will not be fixed under this ticket for version 4.0.

'33 people went there', she said.

This has been discussed above. Any further discussion, please open a new ticket. Thanks.

This ticket was mentioned in IRC in #wordpress-dev by miqrogroove. View the logs.


10 years ago

Note: See TracTickets for help on using tickets.