Make WordPress Core

Opened 6 years ago

Closed 5 years ago

#46800 closed defect (bug) (fixed)

protect against bad characters in media attachment metadata

Reported by: donpark's profile donpark Owned by: joemcgill's profile joemcgill
Milestone: 5.3 Priority: normal
Severity: normal Version: 5.2
Component: Media Keywords: needs-unit-tests
Focuses: Cc:

Description

Media files with bad characters in embedded metadata is commonplace.
But current version of getID3 library does not fully sanitize extracted metadata,
causing uploads, update and retrieval to fail mysteriously because database calls fail silently.

Code snippet below is non-Core workarounds I was going to deploy before being advised to open a Core ticket instead. It should only be used to understand the problem because the fix at Core-level should be closer to getID3, in places where getID3 to called extract metadata.

I also attached a ZIP file containing an MP3 with bad composer and artist values.

<?php

// Current version (1.9.14-201706111222) of getID3 library currently returns bad string values
// that cause DB query/update/insert, serialize/unserialize, upload, and API endpoints to fail.
// This function is used to sanitize values using `array_map`.
// Implementation's goal is to prevent failures, not correctness.
function utf8_encode_attachment_metadata( $value ) {
        if ( empty( $value ) || ! is_string( $value ) ) {
                return $value;
        }
        $encoding = mb_detect_encoding( $value, 'UTF-8, ISO-8859-1, UCS-2' );
        if ( 'UTF-8' === $encoding ) {
                return $value;
        }
        if ( 'ISO-8859-1' === $encoding ) {
                return utf8_encode( $value );
        }
        if ( 'UCS-2' === $encoding ) {
                return mb_convert_encoding( $value, 'UTF-8', 'UCS-2' );
        }
        return utf8_encode( $value );
}

// Filter out bad characters in post title and content.
// For media attachments, `post_title` and `post_content` are built from ID3 tags.
function repair_media_attachment_data( $data, $postarr ) {
        $content = $data['post_content'];
        if ( empty( $content ) ) {
                return $data;
        }
        $mime_type = $postarr['post_mime_type'];
        if ( empty( $mime_type ) ) {
                return $data;
        }
        if ( ! preg_match( '#^(audio|video)/#', $mime_type ) ) {
                return $data;
        }
        $content = wp_check_invalid_utf8( $content, true );
        $data['post_content'] = $content;
        return $data;
}

// Filter out bad characters in media attachment metadata.
function repair_media_attachment_metadata( $metadata ) {
        $mime_type = $metadata['mime_type'];
        if ( empty( $mime_type ) ) {
            return $metadata;
    }
        if ( ! preg_match( '#^(audio|video)/#', $mime_type ) ) {
            return $metadata;
    }
        return array_map('utf8_encode_attachment_metadata', $metadata );
}

// Sanitize media attachments with content built using media metadata.
add_filter( 'wp_insert_attachment_data', 'repair_media_attachment_data', 10, 2 );

// Sanitize extracted media attachment metadata.
add_filter( 'wp_generate_attachment_metadata', 'repair_media_attachment_metadata', 10, 1 );
// Sanitize media attachment metadata update to prevent bad characters from being saved to DB.
add_filter( 'wp_update_attachment_metadata', 'repair_media_attachment_metadata', 10, 1 );
// Sanitize media attachment metadata on retrieval to protect against bad characters already in DB.
add_filter( 'wp_get_attachment_metadata', 'repair_media_attachment_metadata', 10, 1 );

Attachments (6)

melodia-mi-metodo-copia.mp3.zip (1.3 MB) - added by donpark 6 years ago.
MP3 file with bad metadata
bad_id3_patch.diff (1.6 KB) - added by donpark 6 years ago.
proposed patch
bad_id3_patch_2.diff (2.0 KB) - added by donpark 6 years ago.
Sorry. last patch was no good. Use this one. It also patches wp_get_attachment_metadata to cover bad metadata already in the database.
01-Campuri auri.mp3 (3.8 MB) - added by SergeyBiryukov 5 years ago.
02-Ein Bischen-Frieden.mp3 (2.5 MB) - added by SergeyBiryukov 5 years ago.
46800.diff (2.9 KB) - added by SergeyBiryukov 5 years ago.

Change History (18)

@donpark
6 years ago

MP3 file with bad metadata

#1 @joemcgill
6 years ago

  • Milestone changed from Awaiting Review to Future Release
  • Owner set to joemcgill
  • Status changed from new to accepted

Hi @donpark. Thanks for the detailed report.

I'm curious if updating the ID3 library in core would have any affect (see #43836)? If not, we should apply this sanitization as soon as the data is read from the file, which I think would be in wp_read_image_metadata(), if I'm understanding correctly. Any interest in trying to put together a patch for this?

#2 @donpark
6 years ago

I'm curious if updating the ID3 library in core would have any affect (see #43836)?

I have not tried that. IIRC, version of the ID3 library in WPCOM was 1.9.14 while the latest was 1.9.16 and I did not see anything in the commit log that suggested latest would fix this issue.

If not, we should apply this sanitization as soon as the data is read from the file, which I think would be in wp_read_image_metadata(), if I'm understanding correctly.

Can't answer that as I've not looked into where it should be fixed in the Core.

Any interest in trying to put together a patch for this?

Maybe. I just got pulled into fixtheflows, a pseudo-perma-cross-team, project which has a long list of neglected UX issues so I can't say when I could put together a patch. Beware that character encoding detection is pretty much a guessing game and the ID3 spec supporting only two character sets don't mean a thing in the real world.

@donpark
6 years ago

proposed patch

#3 @donpark
6 years ago

Hi @joemcgill. I've attached a patch for this.

We got more users reporting the problem, affecting my short-term priorities.
Let me know what you think and feel free to modify as needed. Thx.

@donpark
6 years ago

Sorry. last patch was no good. Use this one. It also patches wp_get_attachment_metadata to cover bad metadata already in the database.

#4 @dshanske
5 years ago

I spent two hours trying to figure this out before I found this ticket. Silent failures shouldn't happen.

#5 @SergeyBiryukov
5 years ago

#47846 was marked as a duplicate.

@SergeyBiryukov
5 years ago

#6 @SergeyBiryukov
5 years ago

  • Keywords has-patch needs-unit-tests added
  • Milestone changed from Future Release to 5.3

01-Campuri auri.mp3 and 02-Ein Bischen-Frieden.mp3 are test files from #47846.

In my testing with them, bad_id3_patch_2.diff works as expected, but doesn't completely fix the issue: If track number metadata is also malformed or in a wrong encoding, there's a PHP warning:

number_format() expects parameter 1 to be float, string given in wp-includes/functions.php on line 280

46800.diff fixes that and adds a check whether mb_convert_encoding() exists.

This could use a unit test.

#7 follow-up: @deyanpavlov
5 years ago

How do I use this file 46800.diff to patch my WordPress installation? Can I do it from the admin panel without really doing developer's kind of work to edit the source files?

#8 in reply to: ↑ 7 @donmhico
5 years ago

Replying to deyanpavlov:

How do I use this file 46800.diff to patch my WordPress installation? Can I do it from the admin panel without really doing developer's kind of work to edit the source files?

I think the most "friendliest" for non-dev is to just copy and paste the actual code changes in your wp-admin/includes/media.php file. Just be sure that you are in the correct block of code. The line number from the diff can be used as a guide for you but it may be possible that it's not accurate.

#9 @deyanpavlov
5 years ago

this particular .diff file also changes src/wp-includes/post.php
I indeed did what you say, being careful not to mess up everything.
I managed to upload my mp3 files from the admin panel.

Unfortunately, the actual web site could no longer open. It was just a blank page.
So I deleted manually the changes I had introduced copying the changes in the .diff file, and the site could open again.

I will repeat the procedure again just in case.

#10 follow-up: @kirasong
5 years ago

  • Keywords reporter-feedback added

I just tested, and these all upload for me correctly in the media library on trunk.

Perhaps as a result of the upgrade to getID3 in r46166?

Could you please test and see if you are still having the issue?

#11 in reply to: ↑ 10 @donpark
5 years ago

Replying to mikeschroder:

I just tested, and these all upload for me correctly in the media library on trunk.

Perhaps as a result of the upgrade to getID3 in r46166?

Could you please test and see if you are still having the issue?

I've verified that two MP3 files that failed to upload before can now be uploaded using getID3 version 1.9.18-201907240906.

I think this ticket can be marked resolved for now. Thanks for looking into this.

#12 @kirasong
5 years ago

  • Keywords has-patch reporter-feedback removed
  • Resolution set to fixed
  • Status changed from accepted to closed

@donpark Thanks so much for checking, and again for the report!

Going to close this out as it's been reported to be fixed, but please feel free to re-open if necessary.

Thanks everyone!

Note: See TracTickets for help on using tickets.