Opened 6 weeks ago
Last modified 6 weeks ago
#64463 new defect (bug)
XML Escape Codes Applied to RSS and Changes Post Content
| Reported by: |
|
Owned by: | |
|---|---|---|---|
| Milestone: | Awaiting Review | Priority: | normal |
| Severity: | normal | Version: | |
| Component: | Formatting | Keywords: | |
| Focuses: | Cc: |
Description (last modified by )
WordPress incorrectly second-guesses the user and changes characters from the posts when they go into RSS feeds. This creates a problem for dummy apostrophe, dummy quotation mark, and three dots. You want these to display as is in all situations. Publishing will use “these” instead of "these," but more importantly, no American English style guide ever uses the ellipsis as its own character in publishing. The Associated Press (AP) and most newspapers use three dots or three periods. They do *not* use the dedicated ellipsis character. Some fonts render these the same, but some do not. (Chicago and most books use three dots separated by non-breaking spaces.)
Many people type on ASCII keyboards and have read much less in books than on screens, which means that the differences between various punctuation marks are lost in informal contexts—but not the formal ones.
ASCII Non-ASCII "" “” '' ‘’ ... … (combined ellipsis as one character) - –—− Correct Incorrect “” "" ‘’ '' ... … (combined ellipsis as one character) -–—− *Dependent on context*
WordPress *does* handle the various dashes correctly. (i.e., WordPress does not change from one dash to another. The correct usage is up to the author. WordPress does *not* second-guess you.) I only include the various dashes to make a point about how these differences are subtle or invisible to some and glaring to others.
The differences between the four dashes are tricky to spot unless you have lots of editing experience, it can be difficult to tell the difference between the hyphen -, the en dash –, the em dash —, and the negative/minus sign −. All four are valid in different contexts.
Hyphen: *4-6* is “four-six.”
En dash: *4–6* is “(from) four to six.”
Em dash: *4—6* is “(I think that we have) four; six (is also possible).”
Minus: *4—6* is “four minus six,” and —6 is “negative six.”
Obviously, these look interchangeable across different fonts. If I see *4-6* on a page of radio slang, I assume *four-six*. If I see *4-6 p.m.*, then I assume from *four to six p.m.* However, if I am reading something longer, with sentences, and I see *4-6 p.m.* with a noticeably small hyphen that is the same length as the hyphen in *dot-less* or the breaks at the end of a line, then I know what is meant, but I roll my eyes and think less of the editor and publisher. It is little different from seeing a book’s title read as *From See to Shining See*.
Similarly, ... and … are not the same thing. If I see … in an American newspaper, then it is simply a punctuation error.
When it comes to the dummy quotation marks, it looks bad enough to write "word" instead of “word,” but WordPress converts this to ”word,” which is ridiculous. Similarly, someone may hastily write the following: *She said, "Get yourself out of that 'funk' you are in.”* which becomes *She said, ”Get yourself out of that ’funk’ you are in.”*
I will never have a reason to compare "" and “” or and ‘’ in a post, but the behavior is buggy, to say the least.
To make the difference clear, I have attached a PDF of a page with a font that illustrates the differences.
This affects the display on podcast readers that parse the information in the tags. The incorrectness is fairly objective. The system takes it upon itself to substitute characters that may look alike, but that would be like a system converting every capital A to capital alpha (αλφα), just because *A* and *Α* look similar.
Display of different punctuation, as well as screenshots of the issue