WordPress.org

Make WordPress Core

Opened 6 years ago

Closed 5 years ago

Last modified 4 years ago

#6942 closed defect (bug) (worksforme)

typing two spaces in visual editor creates  characters in displayed post

Reported by: meonkeys Owned by:
Milestone: Priority: normal
Severity: normal Version: 2.6
Component: TinyMCE Keywords:
Focuses: Cc:

Description

Typing two spaces after a period causes  characters to be displayed when post is viewed on live Web site. Database upgraded to UTF-8, problem persists. Example: monsenfamily.com/?p=1378

Repros in Firefox 2 on Windows and Linux as well as IE 6 on Windows.

Workaround: after spotting a post with spurious characters, visit "Manage -> Posts" after logging in, click on the offending post and click "Save". The Âs should disappear.

Possibly related to ticket 6562.

Change History (11)

comment:1 follow-up: azaozz6 years ago

  • Keywords reporter-feedback added

Can you clarify: this happens in new posts only or in old, or both (old meaning published before upgrading to WordPress 2.5)? Also the workaround is to edit the post in TinyMCE or the HTML editor?

Looking at the source of the example page, it seems that the Unicode nbsp character U+00A0 is split in two ascii characters \xC2\xA0 and then converted to html entities: &Acirc &nbsp. There's no code in WordPress that would do this.

comment:2 follow-up: mrmist6 years ago

Not seeing this at all in 2.6. (IE7 or FF3).

comment:3 in reply to: ↑ 2 azaozz6 years ago

  • Resolution set to invalid
  • Status changed from new to closed

Replying to mrmist:

Not seeing this at all in 2.6. (IE7 or FF3).

You're right and the reporter never came back to give some more info.

comment:4 azaozz6 years ago

  • Resolution invalid deleted
  • Status changed from closed to reopened

comment:5 azaozz6 years ago

  • Milestone 2.7 deleted
  • Resolution set to invalid
  • Status changed from reopened to closed

comment:6 augnix6 years ago

  • Cc augie@… added
  • Resolution invalid deleted
  • Status changed from closed to reopened
  • Version changed from 2.5.1 to 2.6

I have the same problem with 2.6.

It's easy to reproduce too:

  • Create new post.
  • Use the "visual" editor.
  • End your first sentence with a period (.) followed by two spaces.
  • Begin and end another sentence.

If your character encoding on your browser is set to Unicode (UTF-8) you will see the two spaces after the period, but if you change it to Western (ISO-8859-1) you will see the funny A character (Â).

The funny A character makes it into the 'post_content' variable and into the DB; I have a custom plug-in I use to notify customers of new posts and the funny A gets sent out in those messages too.

Note: if you re-save your post, the funny A character goes away in your published post.

My Browser info.:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.16) Gecko/20080703 Mandriva/2.0.0.16-1.1mdv2008.0 (2008.0) Firefox/2.0.0.16

comment:7 follow-up: azaozz6 years ago

Let me get the steps to reproduce this right:

  • go to the write page and type 2 sentences while your browser is set to UTF-8.
  • save the post still in UTF-8 or are you switching the encoding in the middle of typing?
  • view the post on the site and force the browser to display it in ISO-8859-1.

At this point any UTF-8 chars would show as broken. That seems to be the standard browser behavior. Forcing to view a web page in different encoding would break some chars. All browsers have charset auto-detection, so even if the charset isn't set in the header, they still display the text correctly. The same happens in the editor iframe.

Couldn't reproduce this in Firefox 3. It was switching back to UTF-8 and displaying the text correctly.

If you have problems in your custom plugin, perhaps use the solution from #6562 to filter out the Unicode for nbsp, although it's a valid HTML character.

comment:8 in reply to: ↑ 1 meonkeys6 years ago

Sorry to take so long in providing feedback! I wasn't aware that feedback was requested (I didn't get an email from Trac).

I just upgraded to 2.6, so I'll keep an eye out for this issue. I was just unable to repro, so that's a good sign. Additionally, the post mentioned in this ticket's description no longer shows the strange characters.

Replying to azaozz:

Can you clarify: this happens in new posts only or in
old, or both (old meaning published before upgrading to WordPress 2.5)?

Both.

Also the workaround is to edit the post in TinyMCE or the HTML editor?

Actually, I think either editor worked.

I tried to come up with a more specific set of repro steps but couldn't!

comment:9 in reply to: ↑ 7 augnix6 years ago

Replying to azaozz:

Let me get the steps to reproduce this right:

  • go to the write page and type 2 sentences while your browser is set to UTF-8.

Yes, and put two spaces after the first period.

  • save the post still in UTF-8 or are you switching the encoding in the middle of typing?

Yes. UTF8 all the way through.

  • view the post on the site and force the browser to display it in ISO-8859-1.

At this point any UTF-8 chars would show as broken. That seems to be the standard browser behavior. Forcing to view a web page in different encoding would break some chars. All browsers have charset auto-detection, so even if the charset isn't set in the header, they still display the text correctly. The same happens in the editor iframe.
Couldn't reproduce this in Firefox 3. It was switching back to UTF-8 and displaying the text correctly.

I know it's a pretty trivial bug, but if you look around at the forums it pops up a few times and causes confusion:

http://wordpress.org/support/topic/187662
http://wordpress.org/support/topic/144841

Also, the editor is putting these characters into the DB; which could cause problems for people down the line (even though they should know everything is UTF8).

The real question though is, why does the editor turn a double space into something else?

If you have problems in your custom plugin, perhaps use the solution from #6562 to filter out the Unicode for nbsp, although it's a valid HTML character.

This is what I ended up doing; other plug-in writers are probably going to have to take similar measures.

comment:10 mrmist5 years ago

  • Keywords reporter-feedback removed
  • Resolution set to worksforme
  • Status changed from reopened to closed

No traction. Closing as worksforme. Re-open I guess if this is an issue in 2.7.

comment:11 meonkeys4 years ago

Aha! I think I finally figured out my problem: a plugin.

Textile 2 (Improved), version 2.1.

Once I disabled this plugin, the strange characters disappeared.

Here's the description of that plugin (I don't know what any of this means, but it makes me think it conflicted with a core WordPress plugin or something):

This is a wrapper for Jim Riggs' PHP implementation of Brad Choate's Textile 2. It is feature compatible with the MovableType plugin. Does not play well with the Markdown, Textile, or Textile 2 plugins that ship with WordPress. Packaged by Adam Gessaman.

Note: See TracTickets for help on using tickets.