Make WordPress Core

Opened 17 years ago

Closed 12 months ago

Last modified 7 weeks ago

#4539 closed defect (bug) (worksforme)

Abbreviated year followed by punctuation or markup doesn't texturize

Reported by: pah2's profile pah2 Owned by: jmstacey's profile jmstacey
Milestone: Priority: low
Severity: normal Version: 2.5
Component: Formatting Keywords: has-patch westi-likes wptexturize needs-unit-tests
Focuses: Cc:

Description

An abbreviated year followed by punctuation or markup doesn't texturize properly.

e.g. (Bruce Sterling, '97) is texturized as (Bruce Sterling, ‘97) when the apostrophe should be texturized as ’

e.g. <li>Casino Royale '06</li> is texturized as <li>Casino Royale, &#8216;06</li> when the apostrophe should be texturized as &#8217;

Attachments (15)

abbreviated_year_patch.diff (1.7 KB) - added by jmstacey 16 years ago.
fomatting.diff (1.7 KB) - added by mrmist 15 years ago.
wp-include formatting.php
20100412-0355UTC_abbr_year.diff (1.8 KB) - added by jmstacey 14 years ago.
Diffed against r14075.
4539_2_test_cases_fixed_1_left.diff (2.4 KB) - added by norbertm 13 years ago.
formatting_latest_fixes_all.diff (8.9 KB) - added by norbertm 13 years ago.
some_more_progress.php.diff (3.7 KB) - added by norbertm 13 years ago.
dev-snapshot-2010-11-20.php.diff (11.0 KB) - added by norbertm 13 years ago.
dev-snapshot-2010-11-21.php.diff (15.6 KB) - added by norbertm 13 years ago.
dev-snapshot-2010-11-22.php.diff (18.0 KB) - added by norbertm 13 years ago.
dev-snapshot-2010-11-27.php.diff (19.0 KB) - added by norbertm 13 years ago.
dev-snapshot-2010-11-27-2.php.diff (21.1 KB) - added by norbertm 13 years ago.
dev-snapshot-2010-12-09.php.diff (21.6 KB) - added by norbertm 13 years ago.
dev-snapshot-2010-12-11.php.diff (22.4 KB) - added by norbertm 13 years ago.
formatting-01.patch (733 bytes) - added by xibe 12 years ago.
"like" 42" patch.
4539.diff (2.9 KB) - added by SergeyBiryukov 10 years ago.

Download all attachments as: .zip

Change History (91)

#1 @jmstacey
16 years ago

  • Keywords has-patch added
  • Owner changed from anonymous to jmstacey
  • Status changed from new to assigned
  • Version changed from 2.2.1 to 2.5

Note that the patch I attached also includes the patch I created for Ticket #1258.

#2 @mrmist
15 years ago

  • Keywords needs-patch needs-testing added; has-patch removed

I tried applying this against trunk 10289 but it didn't seem to fix the issue for me.

#3 @mrmist
15 years ago

  • Keywords has-patch tested added; needs-patch needs-testing removed

my mistake, I was testing against the wrong site. D'oh.

The attached fixes the curly quotes and s and the issue above.

@mrmist
15 years ago

wp-include formatting.php

#4 @mrmist
15 years ago

Refreshed patch

#5 @Denis-de-Bernardy
15 years ago

  • Component changed from General to Formatting

#6 @Denis-de-Bernardy
15 years ago

  • Keywords needs-patch added; has-patch tested removed

broken patch

#7 @westi
15 years ago

  • Milestone changed from 2.9 to Future Release

Added Test cases for this to wordpress-tests.

Moving to Future Release until it has a working patch

#8 @nacin
14 years ago

#1258 and #11275 closed as duplicates.

@jmstacey
14 years ago

Diffed against r14075.

#9 follow-up: @jmstacey
14 years ago

  • Cc jon@… added
  • Keywords has-patch needs-testing added; needs-patch removed

Here's another go ready for testing. This patch was created against r14075.

#10 in reply to: ↑ 9 @westi
14 years ago

Replying to jmstacey:

Here's another go ready for testing. This patch was created against r14075.

Thank You!

#11 @westi
14 years ago

  • Milestone changed from Future Release to 3.0

#12 @westi
14 years ago

I ran the unit tests against the patch (http://svn.automattic.com/wordpress-tests/wp-testcase/test_includes_formatting.php) with the following results:

  • TestWPTexturize::test_quotes_before_s - these tests now pass
  • TestWPTexturize::test_quotes - still fail
  • TestWPTexturize::test_quotes_before_numbers - simple case passes quoted case doesn't.

So we are heading in the right direction but still need some more work.

I'll commit the current patch but leave the ticket open for improvements.

#13 @westi
14 years ago

(In [14095]) Improve behaviour of wptexturize with respect to single quotes. See #4539 props jmstacey.

#14 @scribu
14 years ago

  • Priority changed from normal to low

#15 @nacin
14 years ago

  • Keywords needs-patch added; has-patch needs-testing removed
  • Milestone changed from 3.0 to 3.1

Going to punt this along for continued work.

#16 @norbertm
13 years ago

Working on a patch that makes all failing test cases pass.

#17 @norbertm
13 years ago

I was able to make 2 of the failing test cases pass by reorganizing the order of patterns in r16195.

Needs another patch for test_quotes() with HTML tags and tag attributes in it.

#18 @norbertm
13 years ago

  • Cc norbertm added

#19 @norbertm
13 years ago

Technically related to #15241.

#20 @norbertm
13 years ago

Almost done with the simplified patch that resolves all issues.

#21 @norbertm
13 years ago

Submitted a patch above with regexps rewritten from scratch that makes all existing test cases pass, enables disabled test cases to run, including skipped tests and I also added several new cases that all pass.

The only edge case it does not handle is when you want to combine units of degrees, specifically minutes and seconds with the same class of quotes, e.g. ' with ' and " with ". I don't think that can correctly be fixed.

The patch also adds support for quotes spanning multiple html tags and/or shortcodes.

It also fixes #15241.

#22 @scribu
13 years ago

  • Keywords has-patch added; needs-patch removed

#23 @westi
13 years ago

([UT317]) Extra tests for wptexturize from #WP4539 props norbertm.

#24 @westi
13 years ago

I switched the escaped slashes back to using a double quoted string for the string with single quotes in in test_other_html so as to follow the WP Coding Standards.

#25 @westi
13 years ago

  • Resolution set to fixed
  • Status changed from accepted to closed

(In [16280]) Improved RegEx for quote matching in wptexturize. Fixes #4539 and #15241 props norbertm.

#26 @PeteMall
13 years ago

  • Keywords has-patch removed
  • Milestone changed from Awaiting Triage to 3.1

#27 @nacin
13 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

See my comment in #15241.

#28 @nacin
13 years ago

(In [16409]) Revert changes to wptexturize() until test cases pass. Reverts [16280], [16378]. see #4539 and #15241.

#29 @norbertm
13 years ago

Added snapshot of progress. Everything passes including #15241 except #15444. Working on that and test coverage.

#30 @norbertm
13 years ago

Added new snapshot.

Includes fix for #15444.
Follows WP coding standards.
Improved test coverage.

#31 @nacin
13 years ago

Cross-referencing #10606 as another one of those texturize bugs.

#32 follow-up: @norbertm
13 years ago

Updated patch and included further tests. Fixes #10606 as well.

Aiming to improve performance by getting rid of the two preg_replaces() within the while()s with no success so far (we need the count parameter that str_replace() does not have). Is this a performance sensitive function or cached?

#33 in reply to: ↑ 32 @westi
13 years ago

Replying to norbertm:

Is this a performance sensitive function or cached?

The output of this isn't cached

#34 follow-up: @norbertm
13 years ago

Let me see if I can further optimize it.

Any other tickets/concerns related to this?

#35 in reply to: ↑ 34 @nacin
13 years ago

Replying to norbertm:

Let me see if I can further optimize it.

Any other tickets/concerns related to this?

A quick search identified the following open texturize-related tickets: #4539 #4116 #6969 #8912 #10033 #15241 #15444.

#36 @nacin
13 years ago

Another: #14491.

#37 @norbertm
13 years ago

Thanks for collecting related tickets. Here is another patch.

Waiting for feedback on #14491.

Optimized code for performance by replacing two preg_replace()s to str_replace()s (and while()s to if()s).

WPTexturize test results: OK (21 tests, 130 assertions)

I think it's getting mature. Will post a followup once I'm done.

Any other thoughts?

#38 @norbertm
13 years ago

Working in #14491 which needs some rethinking.

#39 @norbertm
13 years ago

Next dev snapshot. Adds test cases and fix for #14491.

There is one issue left. How do you convert

'Class of '99'

into

&8216;Class of &8217;99&8217;

and not

&8216;Class of &8216;99&8217;

if you still want to convert '99' into &8216;99&8217;?

Obviously it depends on the starting quote but that's a leftover quote to be evaluated later to allow quote state to persist across tags and shortcodes. Still thinking about this.

#40 @norbertm
13 years ago

Another snapshot.

Fixes #4116, #4539, #8912, #10606, #14491, #15241, #15444.

Partially covers #6969 and #10033.

Includes a bunch of new unit tests all of which pass now (23 tests, 162 assertions in total).

I'm not totally confident with the current patterns and test coverage. Still reviewing those.

#41 @norbertm
13 years ago

Completed the review. I think it can't become much better than this. It's pretty well covered with tests too. If you have suggestions please post it, otherwise I think the patch could be applied and the fully covered tickets mentioned in the second post above can be closed together with this.

#42 @nacin
13 years ago

How's the performance?

#43 @nacin
13 years ago

  • Keywords has-patch added

Closed #15241, #10606, #15444 all as duplicates. Should be confirmed as fixed before this ticket is ultimately closed.

#44 @xibe
13 years ago

Then I'll paste my test-post from #10606, simply to present that some mundane punctuation uses still just don't work (only for closing quotes):

  • text "[link]".
  • text "[link]", text.
  • text "[link]"; text.
  • text "[link]"- text.
  • text "text [link]".
  • text "text [link]"...
  • text "text [link]"text.

All these have the opening quote applied instead of the closing quote (and then the locale is applied, the closing quote in fr-FR being "« ").

  • text "text number".
  • text "text number" text.

These last two have the correct closing quote applied, but the locale is not taken into account.

I did provide test cases for all of these, a year ago now. If they're wrong, let me know about it, I'll update them.

Test blog is running WordPress 3.1-beta1-16732.

#45 @norbertm
13 years ago

@xibe - It hasn't been committed to SVN yet. The patch is available above. It makes your test cases submitted in #10606 pass. To make sure I've included new tests for the simplified strings that you just submitted. All pass now.

#46 @norbertm
13 years ago

@nacin - As for performance, it's hard to tell because it highly depends on the strings that are passed to it. The original test suite did not pass and many tests were skipped so there's nothing to compare against. What would you suggest for benchmarking this? Please keep in mind that the original function returns incorrect results so we may be comparing code that generates incorrect vs correct output.

As for optimization, I was able to replace a preg_replace() to an str_replace() and a while() loop to a single if() by realizing the same could be accomplished with these better performing functions/constructs.

#47 @westi
13 years ago

  • Keywords 3.2-early added
  • Milestone changed from 3.1 to Future Release

@norbertm: Thanks for all your hard work on this.

I think it is best left for implementation in 3.2 so we get a good deal of time for the changes to soak in trunk.

#48 @westi
13 years ago

#9306 should be reviewed when the changes here are reviewed and committed.

#49 @azaozz
13 years ago

See #8912 re: commented HTML in the post content.

#50 @westi
12 years ago

  • Keywords westi-likes 3.4-early added; 3.2-early removed

#51 @xibe
12 years ago

Hello!

May I ask for renewed interest in this ticket? Is that patch from norbertm 1 year ago still applicable (somehow I doubt it)? May I help in any way (not much of a coder, but I can add more tests to my batch from #11099)?

Thanks all!

@xibe
12 years ago

"like" 42" patch.

#52 @xibe
12 years ago

Tiny patch for those situation where a quote ends with digits.

#53 @bjo
11 years ago

  • Cc bjo added

I still have the issue in 3.4.2 with the wrong quotes in the case

text "[link]", text.
Last edited 11 years ago by bjo (previous) (diff)

#55 @nacin
10 years ago

#22971 was marked as a duplicate.

#56 @nacin
10 years ago

  • Keywords wptexturize added

#57 @miqrogroove
10 years ago

  • Resolution set to worksforme
  • Status changed from reopened to closed

Summary and closure:

(Bruce Sterling, '97) Could not reproduce this bug in 3.9 or trunk testing.

<li>Casino Royale '06</li> Could not reproduce this bug in 3.9 or trunk testing.

'Class of '99' This test case is syntactically ambiguous. Workaround: Don't type the same character three times in one sentence and expect it to mean three different things!

text "[link]", text. In this case, [link] is parsed as a shortcode. Same problem as in #18549.

Any other bugs, please make a new ticket and let this one rest.

@SergeyBiryukov
10 years ago

#58 @SergeyBiryukov
10 years ago

  • Milestone changed from Future Release to 4.0
  • Resolution worksforme deleted
  • Status changed from closed to reopened

'Class of '99' might be ambiguous, but "Class of 99" is not, and it currently fails.

4539.diff is an attempt to fix that, based on xibe's formatting-01.patch.

This ticket was mentioned in IRC in #wordpress-dev by SergeyBiryukov. View the logs.


10 years ago

#60 @miqrogroove
10 years ago

  • Keywords needs-unit-tests added; 3.4-early removed

I can understand the desire to brainstorm more of the edge cases but I'm not sure how much success we will have. For the suggested patch, we would need many more unit tests. For example:

array(
	"George's porch is 99' long.",
	"George&#8217;s porch is 99&#8242; long.",
),

#61 @miqrogroove
10 years ago

  • Resolution set to worksforme
  • Status changed from reopened to closed

Recent comments have gone off topic and do not match the ticket description.

This ticket was mentioned in IRC in #wordpress-dev by doc-bot. View the logs.


10 years ago

#63 @DrewAPicture
9 years ago

  • Milestone 4.0 deleted

#64 @zodiac1978
8 years ago

  • Resolution worksforme deleted
  • Status changed from closed to reopened

For me the first seven testcases from https://core.trac.wordpress.org/ticket/4539#comment:44 are not working. The last two are still okay:

This is my result:

text „[link]„.
text „[link]„, text.
text „[link]„; text.
text „[link]„- text.
text „text [link]„.
text „text [link]„…
text „text [link]„text.
text „text 1234“.
text „text 1234“ text.

See: http://testserver.torstenlandsiedel.de/2016/01/testcase-typography/

#65 @SergeyBiryukov
8 years ago

  • Milestone set to Awaiting Review

#66 follow-up: @xibe
8 years ago

@zodiac1978 Let's improve this, shall we? :) I'm glad I got feedback from my tests, 6 years after they were merged:)

The results on my test post have not changed much since the 4.3 improvement. I'd be more the willing to rework the tests I provided if I could get feedback on what I did wrong.

#67 in reply to: ↑ 66 @zodiac1978
8 years ago

Replying to xibe:

@zodiac1978 Let's improve this, shall we? :) I'm glad I got feedback from my tests, 6 years after they were merged:)

I'm happy to help if I can!

The results on my test post have not changed much since the 4.3 improvement. I'd be more the willing to rework the tests I provided if I could get feedback on what I did wrong.

I am wondering why these tests don't fail. In my and your test post we can see it doesn't work, so they should produce a failing test. I don't have a clue, why this doesn't happen. Maybe someone with more knowledge of unit tests can help here. @ocean90 perhaps?

#69 @xibe
8 years ago

The BuddyPress ticket reverts back to WordPress, so we're left to fix this still.

And once again, I've seen the texturize() issue with quotations in the wild, in a French post:

J’avais découvert Mike Monteiro via sa conférence « Fuck you, pay me« , puis via ses podcasts sur Mule Radio Syndicate, ou encore son travail via sa société Mule Design Studio sur AllThingsD ou Evening Edition, et en avril dernier son livre « Design Is a Job » (horriblement traduit en français en « Métier web designer« ).

You can clearly see three instances of quoted links, two of which fail to display the proper ending-quotation because that final quote is either followed by a comma or a parenthesis.

This has been plaguing French site since at least 2009. Can anything be done for 4.6? How can I help?

Version 0, edited 8 years ago by xibe (next)

#70 @gitlost
8 years ago

Note that the quotations around links issue got moved to #18549. Re the tests, the failing ones were commented out. They're tested in the latest patch on #18549.

This ticket was mentioned in Slack in #core-editor by joen. View the logs.


6 years ago

#72 @Mte90
12 months ago

  • Resolution set to worksforme
  • Status changed from reopened to closed

Tested right now and the problem is fixed with latest 6.2.

I get Bruce Sterling, &#8217;97.

#73 @zodiac1978
12 months ago

  • Resolution worksforme deleted
  • Status changed from closed to reopened

#74 @Mte90
12 months ago

I don't see in the page the same text of the ticket Bruce Sterling, '97.
I saw now your comment #64 https://core.trac.wordpress.org/ticket/4539#comment:64, but it is a complete different text.
Tried that in a new post, those symbols aren't replaced.
Maybe it's the case to open a new ticket about that text as this one can be easy to misunderstood as the title refer to something in the year and not about other symbols.

#75 @zodiac1978
12 months ago

  • Resolution set to worksforme
  • Status changed from reopened to closed

There are more issues mentioned in the comments than just what is in the original decsription, but I just saw the note from @gitlost

Note that the quotations around links issue got moved to #18549.

And all those broken variants are about quotations after links ...

#76 @swissspidy
7 weeks ago

  • Milestone Awaiting Review deleted
Note: See TracTickets for help on using tickets.