WordPress.org

Make WordPress Core

Opened 5 months ago

Last modified 2 weeks ago

#46800 accepted defect (bug)

protect against bad characters in media attachment metadata

Reported by: donpark Owned by: joemcgill
Milestone: 5.3 Priority: normal
Severity: normal Version: 5.2
Component: Media Keywords: has-patch needs-unit-tests
Focuses: Cc:

Description

Media files with bad characters in embedded metadata is commonplace.
But current version of getID3 library does not fully sanitize extracted metadata,
causing uploads, update and retrieval to fail mysteriously because database calls fail silently.

Code snippet below is non-Core workarounds I was going to deploy before being advised to open a Core ticket instead. It should only be used to understand the problem because the fix at Core-level should be closer to getID3, in places where getID3 to called extract metadata.

I also attached a ZIP file containing an MP3 with bad composer and artist values.

<?php

// Current version (1.9.14-201706111222) of getID3 library currently returns bad string values
// that cause DB query/update/insert, serialize/unserialize, upload, and API endpoints to fail.
// This function is used to sanitize values using `array_map`.
// Implementation's goal is to prevent failures, not correctness.
function utf8_encode_attachment_metadata( $value ) {
        if ( empty( $value ) || ! is_string( $value ) ) {
                return $value;
        }
        $encoding = mb_detect_encoding( $value, 'UTF-8, ISO-8859-1, UCS-2' );
        if ( 'UTF-8' === $encoding ) {
                return $value;
        }
        if ( 'ISO-8859-1' === $encoding ) {
                return utf8_encode( $value );
        }
        if ( 'UCS-2' === $encoding ) {
                return mb_convert_encoding( $value, 'UTF-8', 'UCS-2' );
        }
        return utf8_encode( $value );
}

// Filter out bad characters in post title and content.
// For media attachments, `post_title` and `post_content` are built from ID3 tags.
function repair_media_attachment_data( $data, $postarr ) {
        $content = $data['post_content'];
        if ( empty( $content ) ) {
                return $data;
        }
        $mime_type = $postarr['post_mime_type'];
        if ( empty( $mime_type ) ) {
                return $data;
        }
        if ( ! preg_match( '#^(audio|video)/#', $mime_type ) ) {
                return $data;
        }
        $content = wp_check_invalid_utf8( $content, true );
        $data['post_content'] = $content;
        return $data;
}

// Filter out bad characters in media attachment metadata.
function repair_media_attachment_metadata( $metadata ) {
        $mime_type = $metadata['mime_type'];
        if ( empty( $mime_type ) ) {
            return $metadata;
    }
        if ( ! preg_match( '#^(audio|video)/#', $mime_type ) ) {
            return $metadata;
    }
        return array_map('utf8_encode_attachment_metadata', $metadata );
}

// Sanitize media attachments with content built using media metadata.
add_filter( 'wp_insert_attachment_data', 'repair_media_attachment_data', 10, 2 );

// Sanitize extracted media attachment metadata.
add_filter( 'wp_generate_attachment_metadata', 'repair_media_attachment_metadata', 10, 1 );
// Sanitize media attachment metadata update to prevent bad characters from being saved to DB.
add_filter( 'wp_update_attachment_metadata', 'repair_media_attachment_metadata', 10, 1 );
// Sanitize media attachment metadata on retrieval to protect against bad characters already in DB.
add_filter( 'wp_get_attachment_metadata', 'repair_media_attachment_metadata', 10, 1 );

Attachments (6)

melodia-mi-metodo-copia.mp3.zip (1.3 MB) - added by donpark 5 months ago.
MP3 file with bad metadata
bad_id3_patch.diff (1.6 KB) - added by donpark 4 months ago.
proposed patch
bad_id3_patch_2.diff (2.0 KB) - added by donpark 4 months ago.
Sorry. last patch was no good. Use this one. It also patches wp_get_attachment_metadata to cover bad metadata already in the database.
01-Campuri auri.mp3 (3.8 MB) - added by SergeyBiryukov 3 weeks ago.
02-Ein Bischen-Frieden.mp3 (2.5 MB) - added by SergeyBiryukov 3 weeks ago.
46800.diff (2.9 KB) - added by SergeyBiryukov 3 weeks ago.

Change History (15)

@donpark
5 months ago

MP3 file with bad metadata

#1 @joemcgill
4 months ago

  • Milestone changed from Awaiting Review to Future Release
  • Owner set to joemcgill
  • Status changed from new to accepted

Hi @donpark. Thanks for the detailed report.

I'm curious if updating the ID3 library in core would have any affect (see #43836)? If not, we should apply this sanitization as soon as the data is read from the file, which I think would be in wp_read_image_metadata(), if I'm understanding correctly. Any interest in trying to put together a patch for this?

#2 @donpark
4 months ago

I'm curious if updating the ID3 library in core would have any affect (see #43836)?

I have not tried that. IIRC, version of the ID3 library in WPCOM was 1.9.14 while the latest was 1.9.16 and I did not see anything in the commit log that suggested latest would fix this issue.

If not, we should apply this sanitization as soon as the data is read from the file, which I think would be in wp_read_image_metadata(), if I'm understanding correctly.

Can't answer that as I've not looked into where it should be fixed in the Core.

Any interest in trying to put together a patch for this?

Maybe. I just got pulled into fixtheflows, a pseudo-perma-cross-team, project which has a long list of neglected UX issues so I can't say when I could put together a patch. Beware that character encoding detection is pretty much a guessing game and the ID3 spec supporting only two character sets don't mean a thing in the real world.

@donpark
4 months ago

proposed patch

#3 @donpark
4 months ago

Hi @joemcgill. I've attached a patch for this.

We got more users reporting the problem, affecting my short-term priorities.
Let me know what you think and feel free to modify as needed. Thx.

@donpark
4 months ago

Sorry. last patch was no good. Use this one. It also patches wp_get_attachment_metadata to cover bad metadata already in the database.

#4 @dshanske
2 months ago

I spent two hours trying to figure this out before I found this ticket. Silent failures shouldn't happen.

#5 @SergeyBiryukov
3 weeks ago

#47846 was marked as a duplicate.

@SergeyBiryukov
3 weeks ago

#6 @SergeyBiryukov
3 weeks ago

  • Keywords has-patch needs-unit-tests added
  • Milestone changed from Future Release to 5.3

01-Campuri auri.mp3 and 02-Ein Bischen-Frieden.mp3 are test files from #47846.

In my testing with them, bad_id3_patch_2.diff works as expected, but doesn't completely fix the issue: If track number metadata is also malformed or in a wrong encoding, there's a PHP warning:

number_format() expects parameter 1 to be float, string given in wp-includes/functions.php on line 280

46800.diff fixes that and adds a check whether mb_convert_encoding() exists.

This could use a unit test.

#7 follow-up: @deyanpavlov
2 weeks ago

How do I use this file 46800.diff to patch my WordPress installation? Can I do it from the admin panel without really doing developer's kind of work to edit the source files?

#8 in reply to: ↑ 7 @donmhico
2 weeks ago

Replying to deyanpavlov:

How do I use this file 46800.diff to patch my WordPress installation? Can I do it from the admin panel without really doing developer's kind of work to edit the source files?

I think the most "friendliest" for non-dev is to just copy and paste the actual code changes in your wp-admin/includes/media.php file. Just be sure that you are in the correct block of code. The line number from the diff can be used as a guide for you but it may be possible that it's not accurate.

#9 @deyanpavlov
2 weeks ago

this particular .diff file also changes src/wp-includes/post.php
I indeed did what you say, being careful not to mess up everything.
I managed to upload my mp3 files from the admin panel.

Unfortunately, the actual web site could no longer open. It was just a blank page.
So I deleted manually the changes I had introduced copying the changes in the .diff file, and the site could open again.

I will repeat the procedure again just in case.

Note: See TracTickets for help on using tickets.