Make WordPress Core

Opened 4 months ago

Last modified 7 weeks ago

#61833 new defect (bug)

Post titles in Bulk Edit should show decoded HTML

Reported by: dmsnell's profile dmsnell Owned by:
Milestone: 6.8 Priority: normal
Severity: normal Version:
Component: Quick/Bulk Edit Keywords: has-patch
Focuses: Cc:

Description (last modified by dmsnell)

It seems that when a post title contains HTML character references (entities) that the list of posts in the Bulk Edit screen shows the raw HTML markup in its non-decoded form. Instead, it ought to show the decoded form.

https://cldup.com/RNH1LGEjg7.png https://cldup.com/cTnfJAgODG.png

The JavaScript is escaping the raw HTML, preserving the character references as syntax instead of decoding them as text.

[Log] {theTitle: "… is Λ"} (inline-edit-post.js, line 210)
[Log] {theTitle: "x < 2 & y > 3…"} (inline-edit-post.js, line 210)

WordPress is double-encoding the post titles when sending them to the admin page, causing the display error. The raw value in the database is proper, e.g. … is Λ.

https://cldup.com/HJEjFMJ5uA.png

Change History (14)

This ticket was mentioned in PR #7148 on WordPress/wordpress-develop by @dmsnell.


4 months ago
#1

  • Keywords has-patch added

Trac ticket: Core-61833

Previously, post titles in the bulk edit list were showing HTML character references (entities) in the displayed output when they should have been rendering the decoded values. This is because WordPress was double-encoding the post titles in the hidden "inline data" through a call to esc_textarea().

In this patch, the post title value is first _decoded_ before being passed into esc_textarea() to avoid the double-encoding. This preserves the proper post title render in the Bulk Edit component.

Before Patch After Patch
https://github.com/user-attachments/assets/616d51e6-c264-4134-a57f-2c3ea9752a0b https://github.com/user-attachments/assets/3263fe8e-87a9-405d-a4d1-7e2c98e94df0

@dmsnell commented on PR #7148:


4 months ago
#2

Since I didn't show it, this next screenshot shows a normal post with intentional & characters preserved.

https://github.com/user-attachments/assets/2389f3ba-23e6-4b41-8a5b-f7789ce0e963

#3 @dmsnell
4 months ago

  • Milestone changed from Awaiting Review to 6.7

If nobody has any thoughts to share I'd like to merge this patch this week, before August 24.

This ticket was mentioned in Slack in #core by dmsnell. View the logs.


4 months ago

#5 @peterwilsoncc
4 months ago

  • Keywords needs-refresh added

While testing the patch, I noticed it changes how the data is stored in the database. Currently the posts are stored in an encoded form, eg Pens & Pencils.

With this PR applied, using quick edit for a single post modifies the title stored in the database to remove the encoding, eg Pens & Pencils. Using Bulk edit doesn't modify how the data is stored.

Bulk edit uses the same markup as quick edit so changing one will effect the other.

#6 @dmsnell
4 months ago

  • Description modified (diff)

@peterwilsoncc thanks for going through the testing and teaching me something new about the post list.

It took me a bit to figure out what you were referring to, so I've recorded what I think is the dual flow in case others are confused like I was.

https://cldup.com/Ti9hsHgHpN.mp4
External link to screencast

With this PR applied, using quick edit for a single post modifies the title stored in the database to remove the encoding, eg Pens & Pencils. Using Bulk edit doesn't modify how the data is stored.

It seems like this is true but also true accidentally because the bulk edit doesn't allow modifying the post titles. It submits a request with the changed bulk parameters and list of affected post ids in the query args for the request to wp-admin/edit.php while the quick edit screen sends all of the arguments for a specific post as POST variables to wp-admin/admin-ajax.php.

So quick edit allows setting title and thus it's updated.

Bulk edit uses the same markup as quick edit so changing one will effect the other.

I'd like to hear thoughts on this because in my opinion it's going to be more accurate to replace the & from the database and store & instead. Call it a happy unintended side effect of this change. Still, it wasn't the goal of this change to modify the way the data is stored. I just happen to think, considering proper translation of layers and domains, that it's most reliable to store raw text in the post_title field and then let the display logic handle proper HTML escaping (escaping late).


In other words, this seems like an unintended but positive change.


It's way more complicated 😰

This is also unrelated to the patch, but it appears like inline-edit-post.js forces UTF-8 submission via jQuery serializing the post title. This can go wrong on the backend if the site isn't configured for UTF-8 and it corrupts the post title. Were it submitting via an HTML FORM element, then the browser would automatically insert character references for the characters that aren't supported by the page's encoding, but the jQuery code doesn't do this.


There's a lot of complexity in here. I think the existing behavior "works" because esc_textarea() and Core's esc_ family attempts to prevent double-encoding. That is, it doesn't go & > & > && > etc…

I think we can fix this one display issue without breaking the database, even though we're changing it. The way the current code works I don't think it's possible to fix the display issue without changing the database, and fixing the way the post title is stored opens a can of worms and highlights other existing problems that probably are under-reported due to the relative lack of non-UTF-8 sites with non-ASCII characters in the post titles. The existing behavior actually does force the storing of the encoded form of the post title, which only happens to display properly on the rendered page because of the behavior of esc_.

Do you feel strongly one way or the other about this? I'd prefer the post titles display as they do on render.

Last edited 4 months ago by dmsnell (previous) (diff)

#7 @dmsnell
4 months ago

  • Description modified (diff)

#8 @peterwilsoncc
4 months ago

I'm a quite hesitant about changing the database storage because Core would need to account for the two potential ways of storing the data. For the most part this would be fine but once you start hitting technology sites it's a little complex.

Consider the fictional site, HTML Trix and their article about the ampersand: HTML & the ampersand: & (ie, an article displaying the entity in the title).

After an edit of the post, the database would go from HTML & the ampersand: & to HTML & the ampersand: & and modify how the title is displayed on the front end of the site.

This ticket was mentioned in PR #7148 on WordPress/wordpress-develop by @dmsnell.


3 months ago
#9

  • Keywords needs-refresh removed

Trac ticket: Core-61833

Previously, post titles in the bulk edit list were showing HTML character references (entities) in the displayed output when they should have been rendering the decoded values. This is because WordPress was double-encoding the post titles in the hidden "inline data" through a call to esc_textarea().

In this patch, the post title value is first _decoded_ before being passed into esc_textarea() to avoid the double-encoding. This preserves the proper post title render in the Bulk Edit component.

Before Patch After Patch
https://github.com/user-attachments/assets/616d51e6-c264-4134-a57f-2c3ea9752a0b https://github.com/user-attachments/assets/3263fe8e-87a9-405d-a4d1-7e2c98e94df0

This ticket was mentioned in PR #7148 on WordPress/wordpress-develop by @dmsnell.


3 months ago
#10

Trac ticket: Core-61833

Previously, post titles in the bulk edit list were showing HTML character references (entities) in the displayed output when they should have been rendering the decoded values. This is because WordPress was double-encoding the post titles in the hidden "inline data" through a call to esc_textarea().

In this patch, the post title value is first _decoded_ before being passed into esc_textarea() to avoid the double-encoding. This preserves the proper post title render in the Bulk Edit component.

Before Patch After Patch
https://github.com/user-attachments/assets/616d51e6-c264-4134-a57f-2c3ea9752a0b https://github.com/user-attachments/assets/3263fe8e-87a9-405d-a4d1-7e2c98e94df0

#11 @dmsnell
3 months ago

@peterwilsoncc I've updated the PR to re-encode the title on save. it feels too easy, but I'm not convinced it's wrong either. in fact, in the patch I added a comment with the hope that we can uncover other post properties that are getting incorrectly encoded and decoded and then fix them.

Your feedback was really helpful, especially the example you gave. I tried to reimagine this patch in other ways, but another fix has remained elusive. It's good for us to consider this case because I think that cases like this saturate Core.

Relevant to my update:

  • the jQuery code submitting the Quick Edit form appears to send UTF-8 without any escaping. it will send "the raw string."
  • if that string contains character references, those references get saved and then displayed again on the quick edit form. this string is an encoded string, instead of a raw one, but the jQuery doesn't know that.
  • in my patch, by decoding the HTML on output to the form, jQuery is taking raw input and returning raw output.
  • by encoding into $post['post_title'] we end up serializing into the data base in encoded form again.
  • this balances encoding and decoding.

What I'm not sure is that else this breaks, but I'm not sure it will or does, because the status quo is that the database has been storing the encoded form of the title, simply because it's uninterested in whether a string is raw or encoded.

What I think will change with this patch is that now, if a title contains <, >, ', ", or &, those will be escaped in the database. Whereas before the could be a mix of a < b &lt; c - now, that post title will update to a &lt; b &lt; c

Curious on your thoughts on this.

#12 @peterwilsoncc
3 months ago

As discussed in Slack, the changes to the PR have an effect on post titles containing HTML tags.

A post title such as this title has <em>emphasis</em> is stored in the database as such and rendered with italic text. The changes in the PR modify the database storage to this title has &lt;em&gt;emphasis&lt;/em&gt; and therefore modifies the title to display the HTML tags to the reader.

I think the approach is closer to what will be needed to ensure both bulk and quick edit work as expected, it just needs a little wrapping up.

This ticket was mentioned in Slack in #core by stoyangeorgiev. View the logs.


7 weeks ago

#14 @stoyangeorgiev
7 weeks ago

  • Milestone changed from 6.7 to 6.8

This one was discussed at a bug-scrub. With RC1 right around the corner, will move this one to 6.8 so it has more time for wrapping up as @peterwilsoncc mentioned.

Cheers!

Note: See TracTickets for help on using tickets.