WordPress.org

Make WordPress Core

Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#21903 closed defect (bug) (duplicate)

UTF-8 encoded image caption processed incorrectly

Reported by: chenxing Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: Media Keywords: has-patch
Focuses: Cc:
PR Number:

Description

utf8_encode is always run on UTF-8 encoded image captions, which destroys image caption in UTF-8 encoding.

An tentative patch is included (not tested for non UTF-8 encoded contents).

Attachments (1)

image_caption_encoding.patch (2.0 KB) - added by chenxing 7 years ago.
tentative patch

Download all attachments as: .zip

Change History (4)

@chenxing
7 years ago

tentative patch

#1 @nacin
7 years ago

  • Component changed from Administration to Media
  • Keywords needs-unit-tests added

utf8_encode() makes sense when going from ISO-8859-1 to UTF-8. You're right that there is an escape sequence in the IPTC standard to mark that encoding is UTF-8, and that we currently don't check it. It would be helpful if "#090" and "\x1B%G" is fully explained.

Also, for this, we are going to want some unit tests with an image with metadata encoded with UTF-8.

#2 @SergeyBiryukov
7 years ago

  • Keywords has-patch added; needs-unit-tests removed
  • Milestone Awaiting Review deleted
  • Resolution set to duplicate
  • Status changed from new to closed

Duplicate of #9417 and #20408.

A basic unit test was added in [UT665].

#3 @chenxing
7 years ago

  • Cc chenxing added

I don't know if there is a reliable source. I got it from here:
http://php.net/manual/en/function.iptcparse.php#105025

Otherwise maybe we can try seems_utf8.

Note: See TracTickets for help on using tickets.