WordPress.org

Make WordPress Core

Opened 22 months ago

Closed 22 months ago

Last modified 22 months ago

#21903 closed defect (bug) (duplicate)

UTF-8 encoded image caption processed incorrectly

Reported by: chenxing Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: Media Keywords: has-patch
Focuses: Cc:

Description

utf8_encode is always run on UTF-8 encoded image captions, which destroys image caption in UTF-8 encoding.

An tentative patch is included (not tested for non UTF-8 encoded contents).

Attachments (1)

image_caption_encoding.patch (2.0 KB) - added by chenxing 22 months ago.
tentative patch

Download all attachments as: .zip

Change History (4)

chenxing22 months ago

tentative patch

comment:1 nacin22 months ago

  • Component changed from Administration to Media
  • Keywords needs-unit-tests added

utf8_encode() makes sense when going from ISO-8859-1 to UTF-8. You're right that there is an escape sequence in the IPTC standard to mark that encoding is UTF-8, and that we currently don't check it. It would be helpful if "#090" and "\x1B%G" is fully explained.

Also, for this, we are going to want some unit tests with an image with metadata encoded with UTF-8.

comment:2 SergeyBiryukov22 months ago

  • Keywords has-patch added; needs-unit-tests removed
  • Milestone Awaiting Review deleted
  • Resolution set to duplicate
  • Status changed from new to closed

Duplicate of #9417 and #20408.

A basic unit test was added in [UT665].

comment:3 chenxing22 months ago

  • Cc chenxing added

I don't know if there is a reliable source. I got it from here:
http://php.net/manual/en/function.iptcparse.php#105025

Otherwise maybe we can try seems_utf8.

Note: See TracTickets for help on using tickets.