Ticket #4452 (closed defect (bug): fixed)

Opened 5 years ago

Last modified 5 years ago

wpx can include invalid named entities in comment author name

Reported by: tellyworth Owned by: anonymous
Priority: normal Milestone: 2.2.2
Component: Administration Version: 2.2.1
Severity: normal Keywords:
Cc: jhodgdon

Description

Hi,

WP's xml export doesn't currently escape the conents of many fields, including the comment author. If those fields include named HTML entities, that means it's invalid XML. The importer handles it just fine, but some browsers will complain with an error or refuse to download the export file if the XML doesn't validate.

Attached is an example of the problem output, and a patch that uses CDATA escaping on the comment author field. Other fields could be escaped too, but I've limited the change to the one that I've seen cause a problem in the wild.

On the import side, get_tag() will accept CDATA on any field now. It should retain backwards compatibility with export files created prior to this patch.

Attachments

import-cdata-r5694.patch Download (2.6 KB) - added by tellyworth 5 years ago.
export-error.xml Download (2.8 KB) - added by tellyworth 5 years ago.
4452-2.diff Download (554 bytes) - added by foolswisdom 5 years ago.
tellyworth found a problem, this fix from tellyworth fixes the problem importing post body

Change History

  • Milestone set to 2.2.2

comment:2   ryan5 years ago

Looks okay to me.

comment:3   ryan5 years ago

  • Status changed from new to closed
  • Resolution set to fixed

(In [5711]) Use CDATA escaping on fields. Props tellyworth. fixes #4452

comment:4   ryan5 years ago

  • Status changed from closed to reopened
  • Resolution fixed deleted

Committed for 2.3. Let's see how it handles and then schedule it for 2.2.2.

tellyworth found a problem, this fix from tellyworth fixes the problem importing post body

comment:5   ryan5 years ago

  • Status changed from reopened to closed
  • Resolution set to fixed

(In [5718]) Regex fix. Props tellyworth. fixes #4452

  • Status changed from closed to reopened
  • Version set to 2.2.1
  • Resolution fixed deleted

Re-open, currently only fixed on trunk.

I am not sure whether this should go on the same ticket or a different one, but the comment content is another field that might contain entities. As of [5744], if you add a comment to a post with an entity, such as é or ñ (common in Spanish for accents), your XML export file will not validate, as described in this bug report. So probably the wp:comment field in the export needs to be escaped with CDATA too.

  • Cc jhodgdon added

Marked #4684 a duplicate, can we get this checked into the 2.2 branch, because current WordPress.com exports imported into 2.2.1 are broken b/c of this fix to trunk.

  • Status changed from reopened to closed
  • Resolution set to fixed

(In [5822]) Use CDATA escaping/unescaping for comment_author. props tellyworth. fixes #4452 for 2.2.x

(In [5846]) Roll back export portion of #4452 for 2.2.x, see #4452, see #4686

Note: current status of 2.2.x (starting with 2.2.2) is that its export format is unchanged, but it can handle exports from trunk/WP.com

Note: See TracTickets for help on using tickets.