Make WordPress Core

Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#21903 closed defect (bug) (duplicate)

UTF-8 encoded image caption processed incorrectly

Reported by: chenxing Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: Media Keywords: has-patch
Focuses: Cc:


utf8_encode is always run on UTF-8 encoded image captions, which destroys image caption in UTF-8 encoding.

An tentative patch is included (not tested for non UTF-8 encoded contents).

Attachments (1)

image_caption_encoding.patch (2.0 KB) - added by chenxing 8 years ago.
tentative patch

Download all attachments as: .zip

Change History (4)

8 years ago

tentative patch

#1 @nacin
8 years ago

  • Component changed from Administration to Media
  • Keywords needs-unit-tests added

utf8_encode() makes sense when going from ISO-8859-1 to UTF-8. You're right that there is an escape sequence in the IPTC standard to mark that encoding is UTF-8, and that we currently don't check it. It would be helpful if "#090" and "\x1B%G" is fully explained.

Also, for this, we are going to want some unit tests with an image with metadata encoded with UTF-8.

#2 @SergeyBiryukov
8 years ago

  • Keywords has-patch added; needs-unit-tests removed
  • Milestone Awaiting Review deleted
  • Resolution set to duplicate
  • Status changed from new to closed

Duplicate of #9417 and #20408.

A basic unit test was added in [UT665].

#3 @chenxing
8 years ago

  • Cc chenxing added

I don't know if there is a reliable source. I got it from here:

Otherwise maybe we can try seems_utf8.

Note: See TracTickets for help on using tickets.