WordPress.org

Make WordPress Core

Opened 18 months ago

Last modified 14 months ago

#33591 assigned defect (bug)

Media File Missing Length Causes Wordpress Atom Feed to Be Invalid

Reported by: jonnybot Owned by: stevenkword
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 4.2.4
Component: Feeds Keywords: needs-patch good-first-bug needs-unit-tests
Focuses: administration Cc:

Description

In wp-includes\feed.php, the atom_enclosure() function can produce invalid XML, which will break some feed readers and XML parsers that might digest the feed. This happens when the get_post_custom() function returns data that doesn't include all the metadata that it ought to for a given enclosure.

This line-

$enclosure = explode("\n", $enc);

...creates an array by splitting the enclosure meta information on newline characters ("\n"). Then, this line:

echo apply_filters( 'atom_enclosure', '<link href="' . trim( htmlspecialchars( $enclosure[0] ) ) . '" rel="enclosure" length="' . trim( $enclosure[1] ) . '" type="' . trim( $enclosure[2] ) . '" />' . "\n" );

...assumes that the url will be at position 0 in the array, the length in position 1, and the type in position 2.

If the length line is simply missing, as is the case for a particular post I have on my blog, then the next line in the metadata is some kind of configuration data that looks something like this:

a:1:{s:8:"duration";s:8:"00:05:29";}" }  

This breaks the feed in a couple of ways. First, because the length attribute is missing from the metadata, the length attribute of the resulting link tag will read "audio/mpeg" (or whatever type of file it is).

Second, it tries to put a type that contains quotes, which ends the attribute's value early and creates a broken link tag.

<link href="http://myblog.com/wp-content/uploads/2015/03/media.mp3" rel="enclosure" length="audio/mpeg" type="a:1:{s:8:"duration";s:8:"00:05:20";}" />

The broken link tag will break some feed readers and other XML parsers that are pointed at the blog's ATOM feed.

The same issue may also affect the rss_enclosure function in feed.php, though I haven't tested it for that case.

I'm not sure whether the more important bug bug is that that the ATOM feed breaks if the length is missing, or if the bug is, "Hey, the length meta data is missing for this object, how did THAT happen?"

Still, it would seem that the assumption that the length will be at the second (or 1th) position in the array is not a safe one. At the very least, it might be good to encode the output so that double quotes from the metadata don't break the XML of the feed?

Also, any pointers on sleuthing out why the metadata is missing would be much appreciated.

At the very least, it seems prudent to wrap the length (contents of $enclosure[1]) and the type ($enclosure[2]) in the htmlspecialchars() function, the same way the href attribute is.

Attachments (5)

feed-missing-length-bug.php (1.2 KB) - added by jonnybot 18 months ago.
33591.diff (1.1 KB) - added by vtieu 14 months ago.
33591.2.diff (1.7 KB) - added by vtieu 14 months ago.
Searches the enclosure for url, length, and type. Should prevent the problem.
33591.3.diff (1.7 KB) - added by vtieu 14 months ago.
Uploaded correct version.
33591.4.diff (1.9 KB) - added by vtieu 14 months ago.
Fixed coding style.

Download all attachments as: .zip

Change History (16)

#1 @jonnybot
18 months ago

You can see the results of the attached feed-missing-length-bug.php on my site at http://news.missouristate.edu/feed-missing-length-bug.php.

#2 @stevenkword
14 months ago

  • Keywords needs-patch good-first-bug needs-unit-tests added

Thank you @jonnybot for the bug report!

I'm in agreement that mapping the array based on position is not desirable and I am in favor of a safer approach. I'm taking a look.

Last edited 14 months ago by stevenkword (previous) (diff)

#3 @stevenkword
14 months ago

  • Owner set to stevenkword
  • Status changed from new to assigned

#5 @stevenkword
14 months ago

I spoke today with @vtieu on Slack, who is going to take a look at submitting a patch. We decided that the first step is to determine what the required fields for an enclosure and to map those explicitly by key, rather than by position. Then, once the required fields are mapped, we can map the existing fields returned by get_post_cstom() by position.

I think we should also take a look at address #16747 in this ticket if possible, but that shouldn't become a blocker either.

@vtieu
14 months ago

#6 @vtieu
14 months ago

33591.diff Doesn't fix the bug yet, however I've mapped the array.

#7 @stevenkword
14 months ago

@vtieu,

Thank you for the patch. Your response identifies a new problem and the next step in the process. We need unit tests to reveal the reported bug. Once we have that, we can apply a patch in an attempt to fix that problem. If the new tests fail before our patch is applied, and pass afterwards, we can be more confident that we've solved the problem. What we need now is a unit test.

#8 @stevenkword
14 months ago

This going to end up blocked by #35160 since there is not currently test coverage in core for Atom.

@vtieu
14 months ago

Searches the enclosure for url, length, and type. Should prevent the problem.

@vtieu
14 months ago

Uploaded correct version.

#9 follow-up: @vtieu
14 months ago

Here is the patch.

#10 in reply to: ↑ 9 @stevenkword
14 months ago

Replying to vtieu:

Here is the patch.

There are a couple of Coding Standards improvements we could make, but the big blocker is going to be the lack of unit tests in Core for Atom. I've raised the priority of ticket #35160 in hopes that we can get those tests in ASAP. Once that's done, we can write additional tests proving this patch is useful.

@vtieu
14 months ago

Fixed coding style.

#11 @vtieu
14 months ago

I've modified the patch file to better follow the coding standards suggested by Steven.

Note: See TracTickets for help on using tickets.