Make WordPress Core

Opened 6 years ago

Closed 5 years ago

Last modified 5 years ago

#12095 closed defect (bug) (fixed)

Images insert at full size if metadata contains odd characters

Reported by: whlitwa Owned by:
Milestone: 3.0 Priority: high
Severity: normal Version: 2.9.1
Component: Media Keywords: needs-testing
Focuses: Cc:


If the metadata of an image contains odd characters (e.g. © copyright symbol), the image will almost invariably insert into your post as full size.
For some reason, this doesn't always occur. I'm not sure why.

I've tracked this bug to an older version and found that this problem was once fixed.

/wp-admin/includes/image.php: lines 276, 278, 282, 284, 286:

$meta['credit'] = trim($iptc['2#110'][0]);

was changed in 2.6.2 to:

$meta['credit'] = utf8_encode(trim($iptc['2#110'][0]));

And so the problem seemed to disappear. Unfortunetly, the problem seems to have come back. I've applied this fix:

$meta['credit'] = htmlentities(utf8_encode(trim($iptc['2#110'][0])));

It seems to correct the bug. Sadly, the copyright symbol no longer appears correctly. I'm certain there's a better way to fix this.

Attachments (2)

image.php (12.7 KB) - added by miqrogroove 6 years ago.
This version of image.php logs some of the raw file inputs for dev purposes.
wp-iso8859-exif-test.jpg (64.8 KB) - added by miqrogroove 6 years ago.
Test Case #1: ISO 8859 Encoded EXIF Values

Download all attachments as: .zip

Change History (11)

comment:1 @whlitwa6 years ago


I was able to fix the copyright symbol by updating function the_content() in

For some reason, utf8 encoded copyright symbols like to translate to
© (©)
Knowing this, I was able to simply delete the utf8 Â.

#lines 164-169:
function the_content($more_link_text = null, $stripteaser = 0) {
        $content = get_the_content($more_link_text, $stripteaser);
        $content = apply_filters('the_content', $content);
        $content = str_replace(']]>', ']]>', $content);
        $content = str_replace(mb_convert_encoding('Â', 'UTF-8', 'HTML-ENTITIES'),'',$content); //added this line
        echo $content;

Obviously not the best place to fix this bug, but might some kind of backwards utf8 decoding need to take place after the image is uploaded?

comment:2 @miqrogroove6 years ago

  • Component changed from Upload to Media
  • Keywords needs-testing added
  • Milestone changed from Unassigned to 3.0

The symptom described (full size due to metas) can be caused by a de-serialize failure. I've done that a couple times by manually corrupting postmeta values in phpMyAdmin. I bet there is a lower-level cause and solution than this the_content thing, and it has potential security implications.

@miqrogroove6 years ago

This version of image.php logs some of the raw file inputs for dev purposes.

@miqrogroove6 years ago

Test Case #1: ISO 8859 Encoded EXIF Values

comment:3 follow-up: @miqrogroove6 years ago

The attached file appears to trigger postmeta record corruption. An example record ended unexpectedly with:

s:10:"image_meta";a:10:{s:8:"aperture";s:1:"0";s:6:"credit";s:29:"artist copy 

comment:4 in reply to: ↑ 3 @nacin6 years ago

Replying to miqrogroove:

The attached file appears to trigger postmeta record corruption.

miqrogroove was using his patch from #11417. On commit ([13244]) I added utf8_encode() to the EXIF fields, to align them with IPTC, and the attached file works.

Without an image that breaks the current implementation, we're grasping at straws.

comment:5 @miqrogroove6 years ago

We have identified two new problems so far:

  1. Post meta values are stored as UTF-8 serialized strings. This means they are not binary-safe, and therefore any image headers not stored as UTF-8 or ASCII will be broken without conversion. nacin has added an unconditional utf8_encode conversion to any header fields that did not previously have it. This prevents serialization from blowing up at the MySQL layer.
  1. The utf8_encode() function is not appropriate for any image headers not stored as ISO-8859-1 or ASCII. This means all UTF-8 and UCS2 image headers are broken by WordPress. The current obstacle in fixing this is that we do not have any UTF-8 or UCS2 encoded sample images with which we could test a patch for this ticket.

comment:6 @nacin6 years ago

(In [13249]) Use utf8_encode() consistently in wp_read_image_metadata(). Also add some whitespace. props miqrogroove, see #11417, see #12095

comment:7 @miqrogroove6 years ago

Something else that needs testing is what happens when a filename contains non-ascii characters? I assume they are converted to the website encoding before transmission, but between the flash uploader, serialization, and UTF-8 DB columns, I have no idea.

comment:8 @nacin5 years ago

  • Resolution set to fixed
  • Status changed from new to closed

Going to mark this one as fixed, unless my memory isn't serving me well I think we did all we could here. There is also #9417 with similar issues that can be handled in 3.1. Please open a new ticket if there's something else going on here.

Note: See TracTickets for help on using tickets.