Opened 7 years ago
Last modified 4 years ago
#40759 new defect (bug)
Word Count Discrepancies
Reported by: | pento | Owned by: | |
---|---|---|---|
Milestone: | Awaiting Review | Priority: | normal |
Severity: | normal | Version: | |
Component: | Editor | Keywords: | has-patch needs-refresh needs-testing |
Focuses: | administration | Cc: |
Description (last modified by )
I've noticed several discrepancies between how WordPress, Pages, Google Docs, and Word count words. Given the following text, all four count things quite differently.
a 1 foo-bar e.g. jack & jill 5 @ $4.99 . fuzz@baz.blog
WordPress | Word | Pages | Docs | |
---|---|---|---|---|
Individual Words (a, jack, jill) | 3 | 3 | 3 | 3 |
Individual Numbers (1, 5) | 0 | 2 | 2 | 2 |
Hyphenated Words (foo-bar) | 1 | 1 | 2 | 1 |
Abbreviations (e.g.) | 1 | 1 | 2 | 2 |
Punctuation that translates to a word (&) | 0 | 1 | 0 | 0 |
Punctuation that translates to a word in this usage (@) | 0 | 1 | 0 | 0 |
Punctuation that doesn't translate to a word (.) | 0 | 1 | 0 | 0 |
Compound number ($4.99) | 0 | 1 | 1 | 2 |
Email address (fuzz@baz.blog ) | 1 | 1 | 3 | 3 |
I tend to fall in the camp of "what would a reasonable native speaker count as a word", which is probably closest to Word's definition, minus the punctuation that doesn't translate to a word.
Attachments (1)
Change History (9)
#4
in reply to:
↑ 1
@
7 years ago
Replying to pento:
Side note: the Word test was done with Word Online, which appears to just split by whitespace to determine words, Word for Android appears to do the same. I don't know if the desktop versions have a more complex word count algorithm.
Word 2011 for Mac also seems to just split whitespace.
#5
@
7 years ago
Yeah, the WordPress word counter doesn't count numbers as words (for Cyrillic, Greek and Latin alphabets). That was the outcome of the research back then.
I know there are some differences for different locales, but making the word count more precise would involve a lot more filtering/regex that has to be locale specific. This is impractical if the word counting runs all the time. Even now it gets pretty slow for larger posts. The main requirement for it is to be "undetectable" and never slow typing in the editor. I mean, what's good about a word count that interferes with the user being able to type or edit the text :)
Is there another app that shows "dynamic" count that is always visible and updated? Don't think any of the above apps do. Perhaps we should make the WP word counting similar to them?
As far as I remember the app version of Word had a big "stats" dialog showing quite a few statistics besides word count: char count, sentences, images/media, etc. but you had to open it from a submenu.
#6
@
7 years ago
Yah, #8068 mentions "Numbers are not considered words.". :-P That seems to differ from the majority behaviour, though.
Pages and Word Online both show the word count at all times, but they both appear to wait until there's a break in typing to recount. For Pages, it just waits until no keys are currently being pressed. Word Online waits about a second from the last key press. The actual count in both is very fast - pasting a 5000 word block gets a count in a fraction of a second.
#7
@
7 years ago
#30966 was a big change in word count, but it looks like we did not discuss or change counting numbers. I'm fine with counting numbers. To change it, split \u0021-\u0040
into \u0021-\u002F
and \u003A-\u0040
.
We only have word count and character count which is a setting that can be localised. I the case of numbers, it doesn't make a difference for character count as all characters are counted.
For the other issues:
- It looks like
foo-bar
is right. e.g.
: Try to move.
toconnectorRegExp
, which will be replaced with spaces? Sounds right to me. Maybe more characters need to move here. Any counter examples? One in your examples:fuzz@baz.blog
.- I have no strong opinion on
&
. - I'm not sure about
@
, which comes closer to symbols like%
and+
. These also translate to a word? - Standalone
.
: looks right to me. $4.99
: Would be solved with including numbers.fuzz@baz.blog
: Also looks right to me.
Try URLs. 😉
Side note: the Word test was done with Word Online, which appears to just split by whitespace to determine words, Word for Android appears to do the same. I don't know if the desktop versions have a more complex word count algorithm.