Opened 8 months ago

Closed 8 months ago

Last modified 8 months ago

#21903 closed defect (bug) (duplicate)

UTF-8 encoded image caption processed incorrectly

Reported by: chenxing Owned by:
Priority: normal Milestone:
Component: Media Version:
Severity: normal Keywords: has-patch
Cc: chenxing

Description

utf8_encode is always run on UTF-8 encoded image captions, which destroys image caption in UTF-8 encoding.

An tentative patch is included (not tested for non UTF-8 encoded contents).

Attachments (1)

image_caption_encoding.patch (2.0 KB) - added by chenxing 8 months ago.
tentative patch

Download all attachments as: .zip

Change History (4)

tentative patch

  • Component changed from Administration to Media
  • Keywords needs-unit-tests added

utf8_encode() makes sense when going from ISO-8859-1 to UTF-8. You're right that there is an escape sequence in the IPTC standard to mark that encoding is UTF-8, and that we currently don't check it. It would be helpful if "#090" and "\x1B%G" is fully explained.

Also, for this, we are going to want some unit tests with an image with metadata encoded with UTF-8.

  • Keywords has-patch added; needs-unit-tests removed
  • Milestone Awaiting Review deleted
  • Resolution set to duplicate
  • Status changed from new to closed

Duplicate of #9417 and #20408.

A basic unit test was added in [UT665].

  • Cc chenxing added

I don't know if there is a reliable source. I got it from here:
http://php.net/manual/en/function.iptcparse.php#105025

Otherwise maybe we can try seems_utf8.

Note: See TracTickets for help on using tickets.