WordPress.org

Make WordPress Core

Opened 7 months ago

Closed 6 months ago

Last modified 5 months ago

#52250 closed defect (bug) (fixed)

Standardize sanitization of post title during export

Reported by: jmdodd Owned by: pento
Milestone: 5.7 Priority: normal
Severity: normal Version: 5.7
Component: Export Keywords: has-patch commit has-dev-note
Focuses: Cc:

Description

Export currently uses apply_filters( 'the_title_rss', $post->post_title ) to sanitize the post title when generating a WXR. This has the side effect of stripping valid HTML tags (reported in #50540) and also creates artificial misses in post_exists tests because newly-encoded characters in the export file do not match those that may already exist in valid titles in the posts table.

An example post title that would have this behavior is: Alice & Bob. This is encoded in the export file as Alice & Bob, resulting in a near-duplicate post on import.

Most other character data in the export is wrapped with wxr_cdata(), and both post content and excerpts have a special export-ready filter:

wxr_cdata( apply_filters( 'the_content_export', $post->post_content ) )

This changeset treats post titles like other character data and provides a filter if additional handlers are needed.

Attachments (2)

52250.diff (742 bytes) - added by jmdodd 7 months ago.
52250.1.diff (769 bytes) - added by audrasjb 6 months ago.
Patch refreshed - @since mention added

Download all attachments as: .zip

Change History (9)

@jmdodd
7 months ago

#1 @SergeyBiryukov
7 months ago

  • Milestone changed from Awaiting Review to 5.7

@audrasjb
6 months ago

Patch refreshed - @since mention added

#2 @audrasjb
6 months ago

  • Keywords commit needs-dev-note added

This is a nice enhancement and it also bring a better consistency between the data that are sent to the exporter.

Adding needs-dev-note to make sure it's mentioned into the Miscellaneous Changes dev note.

#3 @pento
6 months ago

  • Owner set to pento
  • Status changed from new to accepted

#4 @pento
6 months ago

#50540 was marked as a duplicate.

#5 @pento
6 months ago

Interestingly, this bug does point to the behaviour of the_title_rss being incorrect: per the RSS spec, the <title> field can contain HTML tags, clients should just treat it as plain text, though.

I don't think it's worth changing the behaviour of that filter, though. 🙂

#6 @pento
6 months ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 50011:

Export: Create an export-specific filter for post titles.

Since WordPress 2.5 and 2.6, post_content and post_excerpt have both had export-specific filters: the_content_export, and the_excerpt_export, respectively. post_title, however, has used the_title_rss, which behaves differently in two important ways:

  • It strips HTML tags from the string.
  • It HTML-encodes the title string.

These behaviours are not ideal for exports, since it changes the post title, resulting in data loss in export files, and incorrect post duplicate matching on import. This changes replaces the usage of the_title_rss with a new filter, the_title_export. The new filter is intended to be used in the same as the_content_export and the_excerpt_export.

Props jmdodd, audrasjb.
Fixes #52250.

Note: See TracTickets for help on using tickets.