Opened 15 months ago

Last modified 15 months ago

#20124 new defect (bug)

Smilies Fail Combinations

Reported by: soulseekah Owned by:
Priority: normal Milestone: Awaiting Review
Component: Formatting Version: 3.3.1
Severity: normal Keywords: has-patch
Cc:

Description

I have found that certain combinations of smilies break the monster of a regular expression that parses them.

8-O :-(]
8-) 8-O
8-) 8O[
8-) :-(
8-) :twisted:
8O :twisted: :( 8-( :(

...are some examples where the regular expression appears to be failing to match. Any smilie after a smilie with an 8 in it appears to be skipped. Will try to pinpoint the flaw in the regexp soon.

Attachments (4)

wordpress_smilie_bug.png (18.9 KB) - added by soulseekah 15 months ago.
Missed smilies
functions.php.diff (347 bytes) - added by soulseekah 15 months ago.
not expecting trailing spaces at all
functions.php.2.diff (355 bytes) - added by soulseekah 15 months ago.
last group expecting trailing space, but not consuming it
functions.php.3.diff (850 bytes) - added by soulseekah 15 months ago.
all groups expect trailing space, but don't consume them

Download all attachments as: .zip

Change History (9)

Missed smilies

8-O :-(
8-) 8-O
8-) 8O
8-) :-(
8-) :twisted:
8O :twisted: :( 8-( :(

This is the correct test case above. I missed removing a couple of formatting braces. Attached is the result.

Version 0, edited 15 months ago by soulseekah (next)

This is the regular expression that is matched against the content:

/(?:\s|^);(?:\-\)|\))|(?:\s|^)\:(?:\||x|wink\:|twisted\:|smile\:|shock\:|sad\:|roll\:|razz\:|oops\:|o|neutral\:|mrgreen\:|mad\:|lol\:|idea\:|grin\:|evil\:|eek\:|cry\:|cool\:|arrow\:|P|D|\?\?\?\:|\?\:|\?|\-\||\-x|\-o|\-P|\-D|\-\?|\-\)|\-\(|\)|\(|\!\:)|(?:\s|^)8(?:O|\-O|\-\)|\))(?:\s|$)/m

This can be simplified visually to

(?:\s|^);(...variants...)
or
(?:\s|^):(...variants...)
or
(?:\s|^)8(...variants...)(?:\s|$)

The problem has been pinpointed to the last (?:\s|$) clause. This is attached to "8"-type smilies and consumes the single trailing whitespace following it, meaning that anything that comes afterwards matching any of the groups, does not match ^ and the whitespace would have already been consumed by the previous group.

One solution would be to remove the trailing-space-muncher.

(?:\s|^);(...variants...)
or
(?:\s|^):(...variants...)
or
(?:\s|^)8(...variants...)

The (?:\s|$) clause never helped any of the ":" and ";" type smilies to be parsed only when surrounded by whitespace ( :)text will result in a parsed smilie, not no space after bracket, should this be parsed or not?) Noticed that the 8) smilie is gone in the trunk due to it breaking regular text.

Another solution would be to use lookaheads to test for trailing spaces, these do not consume the spaces themselves.

((?:\s|^);(...variants...)(?=\s|$))
or
((?:\s|^):(...variants...)(?=\s|$))
or
((?:\s|^)8(...variants...)(?=\s|$))

This solves :)text , if it is a problem.

(?:\s|^);(...variants...)
or
(?:\s|^):(...variants...)
or
(?:\s|^)8(...variants...)(?=\s|$)

Appears to be valid as well, only "8"-type smilies will expect tailing whitespace, but will not affect expected front-whitespace, fixing the combinations bug.

I'm probably missing something but attached are very simple diffs then.

not expecting trailing spaces at all

last group expecting trailing space, but not consuming it

all groups expect trailing space, but don't consume them

  • Keywords has-patch added
  • Keywords changed from smilies, regexp, formatting, has-patch to smilies regexp formatting has-patch
  • Keywords smilies regexp formatting removed

Manual keywords generally aren't necessary. Explanation.

Last edited 15 months ago by kawauso (previous) (diff)
Note: See TracTickets for help on using tickets.