WordPress.org

Make WordPress Core

Opened 4 years ago

Closed 17 months ago

Last modified 17 months ago

#13314 closed defect (bug) (fixed)

HTML is deleted when switched to visual editor

Reported by: BH8489 Owned by: azaozz
Milestone: 3.5 Priority: normal
Severity: normal Version:
Component: Editor Keywords:
Focuses: Cc:

Description

Several "advanced" (or more appropriately, specific) HTML and CSS customizations inserted in the WordPress editor are removed when the post or paged is switched to visual editor view.

This old post, http://core.trac.wordpress.org/ticket/3311 , seems to be targeting a similar point, but I disagree with it, in that HTML shouldn't be added in the visual editor. When HTML is added in the HTML editor, however, it should not disappear when the post edited using the to visual editor.

For example, if there is an intro paragraph of plain text, followed by an iframe, flash embed, then table... the iframe, possibly flash, and table will all (rightfully so) need to be created with the HTML editor. Say the administrator puts these enhancements on the page; another end user should be able to edit the paragraph of plain text in visual editor without the iframe, flash, and table being removed from the page. While it would be understandable for the iframe etc. to be replaced with a (?) symbol of some kind in the backend, the code should remain in the HTML viewer, and therefore on the front end pages.

HTML affected includes iframes, certain attributes like colspan in tables, various CSS markup, and pretty much anything that can't be accomplished using the visual editor.

Attachments (1)

13314.patch (861 bytes) - added by SergeyBiryukov 19 months ago.

Download all attachments as: .zip

Change History (43)

comment:1 in reply to: ↑ description koopersmith4 years ago

  • Component changed from Administration to TinyMCE
  • Keywords close added; HTML editor removed
  • Milestone changed from 3.0 to Future Release
  • Owner set to azaozz
  • Severity changed from major to normal

Replying to BH8489:

Say the administrator puts these enhancements on the page; another end user should be able to edit the paragraph of plain text in visual editor without the iframe, flash, and table being removed from the page.

In this situation you might want to look at creating a page template (or depending on the circumstances, a taxonomy or a custom post type). They'll allow you to separate the "enhancements" from the both the end user and the visual editor—so neither will alter your markup.

Given that #3311 existed for two years without traction (four years ago, at that), I'm going to suggest closing with wontfix. I'll let someone more in the know decide.

comment:2 azaozz4 years ago

  • Milestone Future Release deleted
  • Resolution set to invalid
  • Status changed from new to closed
  • Version 3.0 deleted

There is a misconception here. TinyMCE is not an IDE for coding in HTML. It is a text editor that uses HTML in the background to provide extended formatting capabilities. It works with the text editor build in the browser (a.k.a. Rich Text Editor) and extends and enhances it.

Agree with @koopersmith, there are many ways to achieve what you need: custom templates, shortcodes, custom fields, even there's plugins to turn the visual editor off on certain posts/pages, etc.

comment:3 scribu20 months ago

  • Keywords close removed
  • Milestone set to Awaiting Review
  • Resolution invalid deleted
  • Status changed from closed to reopened

azaozz, your reasoning fails to address the main point:

When HTML is added in the HTML editor, however, it should not disappear when the post edited using the to visual editor.

If we can't do that, then why do we let users switch to/from the HTML editor?

comment:4 follow-up: scribu20 months ago

  • Component changed from TinyMCE to Editor
  • Keywords needs-patch added

And yes, I know the HTML editor has recently been renamed to "Text Editor", but that doesn't mean the problem doesn't exist.

Definitions:

"special HTML" = "certain attributes like colspan in tables, various CSS markup, and pretty much anything that can't be accomplished using the visual editor"

After a user inserts special HTML inside the Text Editor, we could:

A) remove the special HTML immediately, so that they're not surprised when they switch to the Visual Editor

or

B) prevent them from switching to the Visual Editor, with a notice: "Switching to the Visual Editor will result in lost markup".

Last edited 20 months ago by scribu (previous) (diff)

comment:5 scribu20 months ago

A conceptually simple way of implementing B) would be this (in pseudocode):

function maybe_text_to_visual( source ) {
  output = text_to_visual( source );
  
  if ( visual_to_text( output ) != source ) {
    alert( "Oh noes! Can't switch to visual editor!" );
    return;
  }

  ...
}

comment:6 nacin20 months ago

Or C) make TinyMCE stop removing markup, which is completely lame.

comment:7 scribu20 months ago

There's an option for that? :)

comment:8 azaozz20 months ago

Re-reading the ticket, this seems more about kses removing html tags when a post published by a user with unlimited_html cap is edited by a user without unlimited_html.

Regarding tags that are changed/stripped when switching Text->Visual and then Visual->Text: this parses the string html into a DOM, then serializes the DOM back into a string. However the DOM for a contentEditable is not exactly the same as a "standard" web page, there are more browser quirks.

In that process anything that the browser cannot parse or understand is skipped. TinyMCE does a very good job of normalizing this and has a large list of tags and attributes to look at. If an attribute is valid xhtml1 and is removed, we can look at it on a case by case basis and try to fix it.

comment:9 scribu20 months ago

"valid xhtml1" is not good enough. If a user inserts an HTML5 tag, it shouldn't just disappear. If TinyMCE can't handle it for whatever reason, we should keep the user in Text mode.

comment:10 follow-up: azaozz20 months ago

Are there any html5 tags that disappear? The current TinyMCE 3.5.6 is fully html5 aware, haven't seen or heard of any problems with that.

XHTML1 is used for internal parsing/validation of the generated html code, for example it outputs </p> tags which are optional in html5 and things like <br /> instead of <br>.

Last edited 20 months ago by azaozz (previous) (diff)

comment:11 scribu20 months ago

  • Keywords reporter-feedback added; needs-patch removed

You're right, HTML5 tags seem to work fine; nevermind. The colspan attribute seems to stick too.

Would have been helpful if the ticket reporter posted some actual examples of markup that disappears.

comment:12 scribu20 months ago

Note: I only tested in Firefox and Chrome:

<section>

A paragraph!

<table style="-moz-box-shadow: 3px">
    <tr>
      <td colspan="2">foo</td>
    </tr>
</table>
</section>

The whitespace does get messed up, though.

comment:13 follow-up: azaozz20 months ago

Yes, when we get the html out of TinyMCE, we strip all line breaks/extra white space, then it's run through pre_wpautop() to remove all <p> tags and insert some line breaks.

As the html is parsed into a DOM and then serialized back to a string, keeping the original white space/line breaks is impossible, except in a <pre>.

Last edited 20 months ago by azaozz (previous) (diff)

comment:14 scribu20 months ago

Could we at least insert some additional line-breaks before the first paragraph?

For example, this:

<section>

A paragraph!

</section>

becomes this:

<section>A paragraph!

</section>

Haven't thought through all the cases, though.

comment:15 scribu20 months ago

On the PHP side: #21825

comment:16 ocean9020 months ago

#21873 was marked as a duplicate.

comment:17 in reply to: ↑ 10 ; follow-up: chirael20 months ago

Replying to azaozz:

Are there any html5 tags that disappear? The current TinyMCE 3.5.6 is fully html5 aware, haven't seen or heard of any problems with that.

Yes, I had the required, autofocus, and placeholder attributes of form fields - all valid HTML5 - stripped out when switching to the visual editor and back (which I filed as bug 21873 and was then marked as a dup of this one)

Going to the visual editor and back also reformatted the HTML which was annoying (it's much harder to edit/debug poorly formatted code, and constantly reapplying good formatting only to have the visual editor strip it out gets old fast.

The HTML reformatting is irritating, but removing valid HTML5 attributes needed on my page is a serious issue.

Version 1, edited 20 months ago by chirael (previous) (next) (diff)

comment:18 in reply to: ↑ 17 ; follow-up: SergeyBiryukov20 months ago

Replying to chirael:

Yes, I had the required, autofocus, and placeholder attributes of form fields - all valid HTML5 - stripped out when switching to the visual editor and back (which I filed as bug 21873 and was then marked as a dup of this one)

#21873 was filed against 3.4.2.

TinyMCE 3.5.6 is currently only available in trunk: http://wordpress.org/download/svn/.

comment:19 in reply to: ↑ 18 chirael20 months ago

Replying to SergeyBiryukov:

#21873 was filed against 3.4.2.
TinyMCE 3.5.6 is currently only available in trunk: http://wordpress.org/download/svn/.

So does that mean that #21873 is not a dup of this one after all? If so, can someone pls change 21873's status to reopened?

Last edited 20 months ago by chirael (previous) (diff)

comment:20 follow-up: scribu20 months ago

No, it means the attributes are removed in WP 3.4.2, but not in WP 3.5-alpha.

comment:21 in reply to: ↑ 4 chirael20 months ago

Replying to scribu:

After a user inserts special HTML inside the Text Editor, we could:
A) remove the special HTML immediately, so that they're not surprised when they switch to the Visual Editor
or
B) prevent them from switching to the Visual Editor, with a notice: "Switching to the Visual Editor will result in lost markup".

While I think these are better than the status quo of silently removing markup, it still creates a problem in the (pretty common IMO) use case of a designer/programmer designing a site, then showing her/his customer how to use WP to update the site's content themselves using the friendly visual editor:

1) Designer uses "special HTML" to achieve effects desired by client
2) Initial project is complete, designer shows client how to do simple content updates her/himself using WP's friendly visual editor
3) Client uses visual editor on page with "special HTML", saves, special HTML is stripped and page/site is now broken :(

comment:22 in reply to: ↑ 20 SergeyBiryukov20 months ago

Replying to scribu:

No, it means the attributes are removed in WP 3.4.2, but not in WP 3.5-alpha.

That's what I meant, sorry for being unclear.

However, I've just confirmed that required, placeholder and autofocus attributes still disappear in trunk:

<input type="text" name="first_name" placeholder="First name" required="required" autofocus="autofocus" />

After switching to Visual tab and back:

<input type="text" name="first_name" />

comment:23 scribu20 months ago

  • Keywords needs-patch added; reporter-feedback removed

comment:24 scribu20 months ago

  • Milestone changed from Awaiting Review to 3.5

comment:25 in reply to: ↑ 13 ; follow-ups: MarcusPope20 months ago

Replying to azaozz:

...
As the html is parsed into a DOM and then serialized back to a string, keeping the original white space/line breaks is impossible, except in a <pre>.

As with most of everything related to software, not only is it possible, it's also pretty easy to implement: http://wordpress.org/extend/plugins/preserved-html-editor-markup/

@SergeyBiryukov: My plugin also prevents the issue you outlined above wrt html5 attributes.

@scribu - Thanks again for resurrecting this issue and giving it serious consideration. If you have any questions or need some help with the details in the future please feel free to email or tweet me. I'm pretty proficient with both TinyMCE and dom parsing in general.

comment:26 in reply to: ↑ 25 scribu20 months ago

Replying to MarcusPope:

As with most of everything related to software, not only is it possible, it's also pretty easy to implement: http://wordpress.org/extend/plugins/preserved-html-editor-markup/

Your plugin has a big warning:

IMPORTANT: Please read the installation instructions carefully. If you have existing content it will not render properly after activating this plugin until you use the Fix It Tools.

That approach is not feasible for Core; we have to find a solution that doesn't require every single WP user to run all of their posts through a conversion tool when upgrading.

If you have such a solution, please share it in patch form.

Last edited 20 months ago by scribu (previous) (diff)

comment:27 MarcusPope20 months ago

Since I know I have been interpreted as being harsh before, I really want to emphasize that there is no malice in this post. It's just business to me, though if you are offended I'd really appreciate knowing by what so I can apologize-for/improve/rephrase it.

That big ole warning is there because my plugin also disables wpautop. I disable wpautop because I think it is an absolutely terrible and unnecessary notion. But that is completely unrelated to the feature of preserving whitespace and newlines in html markup. That feature could be used without any warning to the user. It could even be implemented in conjunction with wpautop.

But even if they were intertwined there are a dozen or more solutions to the warning. It's odd to me that as a developer your first inclination is to say "it can't be done because of X" instead of "what could we do to fix X?". One solution would be to programmatically run the content through the fixit function on upgrade - something the core already does wrt schema updates. Another option, for sites with millions of posts, would be to conditionally disable wpautop on posts created after activation and continue to use wpautop for posts created before activation, along with a hook to fix the content on individual post saves. Perhaps even simpler would be to just make the "preserve my html feature" a button on the toolbar of the html tab. I'm confident you guys can think of solutions to these problems instead of just throwing up your hands.

I didn't automatically call the conversion process (which is just an update to the post_content db field with the result of wpautop(post_content)), and instead put in that warning, because I'm a sole developer and don't have the time or QA resources to verify corner case issues that may or may not even exist. That's where I'd expect the expertise of the core dev team to step in. As a result of my limited time and focus, I gave a little more control to the end user just in case.

If I was actually writing this as a core feature, or a commerical plugin I'd definitely polish it up. For instance the Fix Posts feature doesn't even need to be a permanent change, I could easily run the content through a php version of unautop on plugin deactivation, or store a backup of the post as a special post_type that can be used to restore the posts in the result of a problem. Even though backing up data is a strict recommendation before updating the core or installing new plugins.

The point was that preserving whitespace and newlines is completely possible, not impossible as stated by Azaozz. If you read through my code, you'll see the approach I took. To make the task easier than reading ~600 lines of code, you don't even need to read past line 223 of the php file because the remainder is just admin settings / fix content stuff. For the js file just focus on logic after line 232. And I'll be the first to say my approach isn't even 100% solid, there are namespace conflicts that could occur in the whitespace markers (however unlikely) and there is difinitely room for trying out a more fine-grained whitespace preservation technique.

Ultimately, I've given you everything you need to solve the problem. But I'm not going to do all of the work for you guys because it hasn't proven to be very useful in the past. And in my eyes the likelihood that it will just go to waste and be untouched for another couple of years is just too strong for me to invest the amount of effort you are asking. I'd love to find out that I'm wrong in that estimation, and with some consistency in that direction I'd make more contributions to the core.

Though even if I was 100% convinced it would go to good use, I still wouldn't be able to help on that level today because my daughter is expected to be born in the next two to three weeks. I'm more than happy to be a consultant, if you will, on how to overcome specific hurdles you all discover along the way, but that's the best I can offer given the situation.

Thanks again, for giving this issue some real consideration. My email is anything at marcuspope.com and my twitter handle is marcuspope if you have any specific questions.

comment:28 in reply to: ↑ 25 azaozz19 months ago

Replying to MarcusPope:

Replying to azaozz:

...
As the html is parsed into a DOM and then serialized back to a string, keeping the original white space/line breaks is impossible, except in a <pre>.

As with most of everything related to software, not only is it possible, it's also pretty easy to implement: http://wordpress.org/extend/plugins/preserved-html-editor-markup/

That's not exactly true :) This plugin preserves some of the html code formatting/line breaks, but breaks pretty bad with other. Example:

<p
    class="some"
    id="else"
    style="margin: 1em 0;"
>

That's besides the point though, the text editor doesn't have preserving white space and line breaks as priority (as they don't matter in html). The priority there is to keep the paragraphs flowing "naturally" and allow use of some html tags. There was a discussion in IRC about making it easier for plugins to add Code tab, trying to preserve line breaks and white space would be good for there.

The above plugin also replaces (some?) TinyMCE logic when inserting <br> tags. If that works well in all browsers, why not make a patch and submit it to them.

Last edited 19 months ago by azaozz (previous) (diff)

comment:29 scribu19 months ago

How about we forget about the whitespace and just focus on not losing the attributes...

comment:30 azaozz19 months ago

Yeah, should be safe to switch the cleanup from html4/xhtml1 to html5. All we need to do is add 'schema' => 'html5' to TinyMCE's init.

SergeyBiryukov19 months ago

comment:31 follow-up: SergeyBiryukov19 months ago

With 13314.patch, placeholder and required attributes from comment:22 no longer disappear. autofocus, however, still disappears.

comment:32 in reply to: ↑ 31 azaozz19 months ago

Yes, autofocus is missing from the <input> attr list, it should be in the next TinyMCE version. We can backport that patch for now.

comment:33 azaozz19 months ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In [21875]:

Switch TinyMCE's schema to 'html5', add 'autofocus' to the attributes map for <input>, part props SergeyBiryukov, fixes #13314

comment:35 follow-up: stillrivercreative19 months ago

  • Cc stillrivercreative added
  • Resolution fixed deleted
  • Status changed from closed to reopened

Sorry, I am not as knowledgable as some here. The detailed technical info in this thread is useful, however I cannot see how or where this issue could be considered resolved. Perhaps my expectations are not realistic. Can someone please explain, in layman's terms, or for the average 'casual developer' who knows a little HTML and CSS, or for the newb who just wants to customize their pages:

When I add some custom CSS in the HTML editor, then switch to the visual editor, then back to the HTML editor, sometimes my custom CSS is gone. (I cannot supply an example just now; we all know this happens.)

I think we need a reality check: This does happen, and it should never happen. Period.

If custom CSS is added in the HTML editor it should remain there. It should stick like glue, whether or not it functions properly. I appreciate that WP is free and there are myriad ways to get free support. However until this is fixed for good, without special plugins or workarounds, I really don't see how WordPress can be considered a stable world-class platform, or how I can recommend it to my clients.

I really wish to undertand this clearly, so thank you in advance for any add'l insight you can provide.

DevNewb

Last edited 19 months ago by stillrivercreative (previous) (diff)

comment:36 in reply to: ↑ 35 nacin19 months ago

Replying to stillrivercreative:

When I add some custom CSS in the HTML editor, then switch to the visual editor, then back to the HTML editor, sometimes my custom CSS is gone. (I cannot supply an example just now; we all know this happens.)

Please open a new ticket with an example once you can supply one. We'd be happy to take a look into it. I agree this should never happen. This ticket has gotten quite long and did contain a good, important fix, so let's leave this one as is.

comment:37 helenyhou19 months ago

  • Keywords needs-patch removed
  • Resolution set to fixed
  • Status changed from reopened to closed

comment:38 follow-up: stillrivercreative19 months ago

  • Severity changed from normal to minor

This ticket has gotten quite long and did contain a good, important fix, so let's leave this one as is.

No prob, will open new thread w/ example when time. Until then can anyone explain how to apply the patch? Will it prevent custom HTML/CSS from being removed? After having to rebuild entire pages due to this issue, I have resorted to extreme measures to protect client sites - copying all of a page's code to a text file, cloning pages as emergency backups, etc.

Thx

comment:39 in reply to: ↑ 38 SergeyBiryukov19 months ago

  • Severity changed from minor to normal

Replying to stillrivercreative:

Until then can anyone explain how to apply the patch?

It's already committed to 3.5 and included in beta builds.

comment:40 nacin17 months ago

  • Keywords revert added
  • Resolution fixed deleted
  • Status changed from closed to reopened

What a mess. This resulted in #22175 (table attributes), #22790 (embed codes), #22842 (embed codes round 2). It seems TinyMCE does not conform to basic backwards compatibility principles and that's a major problem.

We tried a duck punch in [23119] and [23120] to override TinyMCE's schema definitions, but it wasn't enough. The whole thing is systematically broken and it is simply not repairable without giving upstream some time to get things straight. The schemas are inconsistent and incompatible, embed code handling is entirely broken, and TinyMCE's media plugin seems to break horribly on the HTML5 schema.

Very simply, we're throwing in the towel for 3.5 and will have no choice but to revisit in 3.6.

comment:41 nacin17 months ago

  • Keywords revert removed
  • Resolution set to fixed
  • Status changed from reopened to closed

Change of heart. (And a lot of man hours.) See [23153] and related commits on both #22790 and #22842.

comment:42 jond3r17 months ago

  • Cc jond3r@… added

I've always stayed away from the Visual tab, ever since I learned that TinyMCE "messes" with my HTML. Inspired by the recent tickets #22790 and #22842, I started testing different configuration parameters using the tiny_mce_before_init filter. An interesting find I made was the verify_html setting, which if set to false completely disables verification and "clean-up" of the elements. With this setting I assume that the definition of an HTML4 or HTML5 schema becomes redundant, since every markup is accepted. However, I realize that this is probably not a generally desired behavior, and it certainly breaks previous WP behavior. I think it was a feature like this that was requested higher up in the thread.

Note: See TracTickets for help on using tickets.