WordPress.org

Make WordPress Core

Opened 15 months ago

Closed 9 months ago

#23219 closed defect (bug) (fixed)

Duplicate enclosure metadata created because of disagreement on meta field formatting

Reported by: redsweater Owned by: nacin
Milestone: 3.6 Priority: normal
Severity: normal Version: 2.8
Component: XML-RPC Keywords: has-patch needs-unit-tests
Focuses: Cc:

Description

Years ago I reported an issue in Ticket #7773 that the enclosures would be duplicated each time the post was edited and resubmitted through the XMLRPC API. That bug was fixed in [10383] but a nuance was overlooked that causes this kind of duplication to occur with every edit if the ORIGINAL custom field meta for the enclosure was generated by the automatic content-scraper in WordPress.

If a post with links to "enclosure" type files is submitted either through the web interface or through the XMLRPC API, it will automatically generate the insertion of enclosure metadata as a custom field with fragile-structured content of the format:

add_post_meta( $post_ID, 'enclosure', "$url\n$len\n$mime\n" );

But the XMLRPC implementation, where the attempt is made to avoid generating duplicates, uses a slightly different fragile template, with no trailing newline:

$encstring = $enclosure['url'] . "\n" . $enclosure['length'] . "\n" . $enclosure['type'];

To make a long story short: the attempt to detect an existing enclosure on the post always fails because the existing enclosure string has a newline, and the XMLRPC-based one does not.

I propose that this be fixed by at enacting 1 and probably also 2:

  1. Make sure duplicate detection method insensitive to trailing whitespace. This will fix the bug in a way that doesn't require retroactively "fixing" the existing stored metadata with trailing newline.
  1. Change the format of the "scraped" enclosure generation to match the format used by XMLRPC. Either with or without the newline would probably be fine, but it seems cleaner to stick with the one that doesn't require a trailing newline, just three blobs of text separated by newlines. It also seems lucky that the last blob is a MIME type and thus would never have a meaningful newline at the trailing end.

I'm attaching a patch that achieves both of these goals. I report this to the XML-RPC component because that's the impact is, even if some of the issue is in the default scraping code from functions.php.

I'm a little less certain about how to tackle a unit test for these cases, but if somebody wants to point me in the right direction I can take a hack at that too.

Attachments (3)

FixDuplicateEnclosures.diff (1.0 KB) - added by redsweater 15 months ago.
Patch to be lenient about matching existing enclosures, and to agree on the formatting of enclosure string for future
23219.diff (1.3 KB) - added by nacin 12 months ago.
23219-tests.diff (2.6 KB) - added by tollmanz 12 months ago.

Download all attachments as: .zip

Change History (12)

redsweater15 months ago

Patch to be lenient about matching existing enclosures, and to agree on the formatting of enclosure string for future

comment:1 redsweater15 months ago

  • Keywords has-patch added

comment:2 redsweater15 months ago

Please note this is slightly worse than just a little extra metadata in the database. When the RSS feed is generated for a blog, it will include multiple enclosure links for the same asset:

<enclosure url="http://www.coreint.org/audio/CoreInt_72.mp3" length="21660995" type="audio/mpeg" />
<enclosure url="http://www.coreint.org/audio/CoreInt_72.mp3" length="21660995" type="audio/mpeg" />
<enclosure url="http://www.coreint.org/audio/CoreInt_72.mp3" length="21660995" type="audio/mpeg" />

Judging from the fact this problem has gone unnoticed ;) I suspect that most feed clients including Apple's iTunes parser are not bothered by the presence of multiple, identical enclosures. But it's obviously a little messy.

comment:3 redsweater15 months ago

For completeness, I neglected to include steps to reproduce the bug:

  1. Create a new post from a XMLRPC client such as the WordPress for iOS app or MarsEdit.
  2. Include in the post some reference to an enclosure-style media asset such as an MP3. E.g.:
<a href="http://www.coreint.org/audio/CoreInt_72.mp3">Download</a>
  1. Send the post to the blog.
  2. Observe that the post has automatically earned itself an enclosure and the custom field entry that goes with that.
  3. Open and make whatever basic edit to the post, and send to the blog again.
  4. Observe that it now has an extra custom field metadata entry for the same enclosure.

This will happen again and again now matter how many edits you make. So it can become quite egregious depending on the frequency of editing a particular post.

comment:4 SergeyBiryukov15 months ago

  • Keywords needs-unit-tests added
  • Milestone changed from Awaiting Review to 3.6
  • Version changed from trunk to 2.8

comment:5 nacin12 months ago

This is a great candidate for unit tests. I think we should standardize on \n trailing. So, we can do an rtrim() comparison. I'm going to toss this patch in, but would like to see unit test coverage as well.

nacin12 months ago

comment:6 nacin12 months ago

Actually, here's a patch: 23219.diff. Would have committed it but need to test first.

tollmanz12 months ago

comment:7 tollmanz12 months ago

  • Cc tollmanz@… added

Added tests that pass with patch filed "23219.diff" applied and fail without it. I wasn't sure of the best place to put the tests, but I think the location in which I added them makes sense.

comment:8 nacin9 months ago

In 1308/tests:

Some basic tests for saving enclosures via XML-RPC. props tollmanz. see #23219.

comment:9 nacin9 months ago

  • Owner set to nacin
  • Resolution set to fixed
  • Status changed from new to closed

In 24623:

XML-RPC: Save enclosures with a trailing new line. fixes #23219.

Note: See TracTickets for help on using tickets.