Make WordPress Core

Opened 3 years ago

Closed 3 years ago

#20408 closed defect (bug) (duplicate)

wp_read_image_metadata wrongly assumes IPTC metadata is Latin1

Reported by: koke Owned by:
Milestone: Priority: normal
Severity: normal Version: 3.4
Component: Media Keywords:
Focuses: Cc:


Some applications like Aperture, use UTF8 in the IPTC tags. When exported images are uploaded to WordPress, the title and description metadata has the wrong encoding

The problem is in wp_read_image_metadata, since it runs utf8_encode for every field even if it's already utf8

To check if the parsed iptc is already UTF-8:

$iptc_is_utf8 = isset( $iptc['1#090'] ) && "\x1B%G" == $iptc['1#090'];

Depending on that result, the tags should go through utf8_encode or left as they are

Attachments (1)

testiptc.php (245 bytes) - added by koke 3 years ago.
Test script: dump IPTC contents

Download all attachments as: .zip

Change History (5)

@koke3 years ago

Test script: dump IPTC contents

comment:1 @koke3 years ago

Test image: https://breakmeplease.files.wordpress.com/2012/04/dsc_4320.jpg

Title: Título

Description: Descripción

comment:2 @koke3 years ago

Related: #7495

comment:3 @SergeyBiryukov3 years ago

Duplicate of #9417?

Seems that $iptc['1#090'] marker may not always be present. test-image-iptc.jpg (with IPTC tags written by IrfanView 4.33) doesn't have it. In that case, seems_utf8() check looks more reliable.

Last edited 3 years ago by SergeyBiryukov (previous) (diff)

comment:4 @SergeyBiryukov3 years ago

  • Keywords needs-patch removed
  • Milestone Awaiting Review deleted
  • Resolution set to duplicate
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.