Make WordPress Core

Opened 10 years ago

Last modified 4 years ago

#30644 new enhancement

"wpautop" Enhancements

Reported by: stefanrz's profile stefanrz Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: Formatting Keywords: wpautop 2nd-opinion
Focuses: Cc:

Description

Since there are several problems (e.g. invalid markup) with the current "wpautop" function I tried to come up with a new approach. The text gets parsed and a little analyzed in order to generate valid and comprehensible markup.

The script is not really compatible with the current implementation since whitespaces and line breaks are added differently. Performance is slightly worse, depending on the input (normal text vs. heavy html) of course. Comments in the code are still missing.

I think Shortcodes should also be considered in this markup generation process so the additional use of "shortcode_unautop" could be avoided.

Please let me know what you think and test it if you like.

Sample

Input

paragraph
<!-- comment -->
<hr> 
paragraph <strong>test</strong>

<i>italic</i>
normal

<!-- div -->

<div class="whatever">
  <blockquote>
    paragraph
  </blockquote>
  paragraph
</div>

paragraph
<ul>
  <li>line</li>
  <li>
    paragraph <!-- paragraph -->
    
    paragraph
  </li>
</ul>
paragraph

<pre>
  
  Honor
    this whitespace

</pre>

paragraph

<style><!--
Do not alter!
--></style>

paragraph
<p>paragraph</p>

<div>text</div>

Output

<p>
  paragraph
  <!-- comment -->
</p>

<hr>

<p>
  paragraph <strong>test</strong>
</p>

<p>
  <i>italic</i><br>
  normal
</p>

<!-- div -->

<div class="whatever">
  <blockquote>
    <p>
      paragraph
    </p>
  </blockquote>
  
  <p>
    paragraph
  </p>
</div>

<p>
  paragraph
</p>

<ul>
  <li>
    line
  </li>
  
  <li>
    <p>
      paragraph <!-- paragraph -->
    </p>
    
    <p>
      paragraph
    </p>
  </li>
</ul>

<p>
  paragraph
</p>

<pre>
  
  Honor
    this whitespace

</pre>

<p>
  paragraph
</p>

<style><!--
Do not alter!
--></style>

<p>
  paragraph
</p>

<p>paragraph</p>

<div>
  text
</div>

Attachments (3)

WP_Formatting_AutoP.php (11.6 KB) - added by stefanrz 10 years ago.
WP_Formatting_AutoP
30644.patch (941 bytes) - added by azaozz 10 years ago.
wpautop_replacement.php (11.0 KB) - added by azaozz 10 years ago.

Download all attachments as: .zip

Change History (9)

@stefanrz
10 years ago

WP_Formatting_AutoP

#1 @miqrogroove
10 years ago

  • Keywords wpautop added

@azaozz
10 years ago

#2 @azaozz
10 years ago

  • Milestone changed from Awaiting Review to Future Release

@stefanrz thanks for the idea and the patch. Works quite well as far as I see. Cleaned up the coding style to match the WordPress standards and made it into a plugin in wpautop_replacement.php. We probably will need to replace/improve some of the variables names to make them clearer. Also adding some inline comments explaining what is happening/how does it work would be very nice.

Think fixing/enhancing wpautop will be best done in a feature plugin. Also this is an excellent place to use the tests first approach. Having robust unit tests for wpautop() will make this much easier to get right and "tune up" in the future.

There are quite a few wpautop tests on #25856. We can probably start by cleaning them up and (conditionally) adding them.

Last edited 10 years ago by azaozz (previous) (diff)

#3 follow-up: @khag7
4 years ago

  • Keywords 2nd-opinion added

There are 39 open tickets that the proposed feature plugin above would either fix immediately or put us most of the way towards fixing. I'll list the 39 if you'd like, but for now just take my word for it.

I recognize this is 6 years old. I recognize this is not as desirable now as it was prior to Gutenberg. I recognize that the 39 open tickets to fix wpautop are not a high priority. I recognize that any new version of wpautop needs to maintain back compat. But despite all that, I'd like to try to understand if there's a desire to fix this and if so, what steps need to be taken.

As far as I can understand, designing some unit tests would be the first step. If I were to do so (and I'd likely need some help with example content) I'd end up with several dozen tests that fail on the current wpautop function. What is step two? Make the current wpautop pluggable and replace with the feature plugin from the previous comment and then test again? Lets say we get to a point where we pass all our tests. Does this even get implemented or is this too likely to be a breaking change for users? How do we test against that?

Tagging @azaozz as you maintain Formatting and I'd really like your guidance on this. Is this likely to succeed? Is this desirable?

#4 in reply to: ↑ 3 @azaozz
4 years ago

Replying to khag7:

I recognize this is 6 years old. I recognize this is not as desirable now as it was prior to Gutenberg. I recognize that the 39 open tickets to fix wpautop are not a high priority. I recognize that any new version of wpautop needs to maintain back compat. But despite all that, I'd like to try to understand if there's a desire to fix this and if so, what steps need to be taken.

Good question :)

The main reason this didn't go ahead six years ago was that it needed a lot of unit tests, and also specific tests (and possibly fixes) for back-compat. Autop has always been... very temperamental. There are edge cases in it that the "surrounding code" has learned to expect. Over the years there were several attempts to "fix" some of these, but most failed as they either introduced new edge cases or by fixing a particular case broke the way something similar works. It's tough to use regex on HTML...

Then, as you mention, autop went into "maintenance mode" couple of years ago with the release of the block editor in WP 5.0. The block editor doesn't use wpautop. It is used only as back-compat when there is only a single classic block (that doesn't get the block boundaries). Been looking into fixing that use case too.

That means virtually all HTML coming from the block editor still has the <p> tags when saving it in the DB. The blocks "content HTML" is still run through wpautop on displaying, but there's no missing paragraph tags in there. This is still slow, and somewhat cumbersome, as there is no need to have the wpautop "display filter" on every front-end page load.

In that terms I see the "future of wpautop" mainly in two directions:

  • Devise a way to not run the posts content through wpautop on every page load. This has to mostly look at "side effects", i.e. possible back-compat problems caused by white space changes added by wpautop to HTML that has all the <p> tags, etc.
  • Eventually fix and/or improve wpautop for HTML that has paragraph tags. This won't be needed if the above is implemented, and is more risky as wpautop still needs to work on older post content, and still needs it's old quirks to be backwards compatible.

To answer your questions directly: don't think it's worth it fixing any edge cases in wpautop. The reasons are:

  1. Autop is in "maintenance mode" and on its way out.
  2. Fixing an edge case usually brings another, or brings some back-compat problems for code that is expecting the edge case.

This ticket was mentioned in Slack in #core-editor by azaozz. View the logs.


4 years ago

#6 @khag7
4 years ago

Ozz,

Thanks for taking the time to respond.


very temperamental

Yeah you don't say! I've been around a while. Not a large contributor by any means, but I'm up to speed on the frustrations that come along with it.


I appreciate your thoughtful response, but skipping right to the end gives me the answer I need:

don't think its worth it fixing any edge cases in wpautop

If it really is "on its way out", there's no reason to try to force a fix here just for sake of closing 39 old tickets.


I gather that the real problem, the real true reason we can't actually fix this awful awful function, comes in your last sentence:

...code that is *expecting the edge case*

It would be one thing to fix the edge cases. Unfortunately, we'll break the sites that are, intentionally or not, relying on those edge cases. How sad.


Thanks again for your time, I was kind of looking forward to really pushing this forward but I hate to sink time into something thats not likely to go forward, and I'll skip along to something else.


In the unlikely event someone here disagrees and wants to urge me to push forward, feel free, but as far as I can tell, the 39 tickets related to wpautop should be marked "wontfix" with an explanation that wpautop is frozen and no intent exists to solve any edge cases. Can I go ahead and do that? Who's blessing is needed on that?

Note: See TracTickets for help on using tickets.