Make WordPress Core

Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#35316 closed defect (bug) (fixed)

Images with latin extended characters in exif (slovak/czech) are missing thumbnails

Reported by: michalrusina's profile michalrusina Owned by: ocean90's profile ocean90
Milestone: 4.4.2 Priority: normal
Severity: normal Version: 4.4
Component: Media Keywords: has-patch
Focuses: Cc:

Description

after uploading image with latin extended characters in exif (slovak/czech) there is no thumbnail in media library and image sizes (thumb, medium, large and all custom sizes) are missing in wp, so wordress will autmatically insert fullsize image in themes where thumbnail size should be. the image sizes (files) are created but are NOT registered in wordpress.

Attachments (3)

35316.patch (504 bytes) - added by ocean90 9 years ago.
Image_keywords.png (134.8 KB) - added by pavelevap 9 years ago.
gabcikovo.jpg (3.1 MB) - added by dd32 9 years ago.
Image from https://core.trac.wordpress.org/ticket/35316#comment:6 for archival purposes

Change History (24)

#1 @swissspidy
9 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to duplicate
  • Status changed from new to closed
  • Version 4.4 deleted

Hi there, thanks for the report.

We're tracking this issue in #15955, see also comment:11:ticket:15955.

Related/duplicates: #18634, #19842, #21217, #22363, #28808, #23588, #32887.

#2 @michalrusina
9 years ago

  • Keywords reporter-feedback added
  • Resolution duplicate deleted
  • Status changed from closed to reopened
  • Version set to 4.4

i dont think that this is a duplicate of #15955, in this case there is nothing wrong with filenames (filename gabcikovo.jpg has no extended characters), the whole problem lies in exif data. as i mentioned earlier, the files are successfully generated, but wordpress assumes that theyre not.

#3 @swissspidy
9 years ago

  • Milestone set to Awaiting Review

Sorry, I guess you're right!

#4 @swissspidy
9 years ago

  • Keywords reporter-feedback removed

#5 @pavelevap
9 years ago

@michalrusina: Interesting, could you please upload or link any image example for testing?

#7 @pavelevap
9 years ago

Confirmed. It works well in 4.2.4 and 4.3.1, but 4.4 is broken.

#8 @pavelevap
9 years ago

_wp_attached_file is created, but _wp_attachment_metadata is missing.

This ticket was mentioned in Slack in #core by pavelevap. View the logs.


9 years ago

#10 @michalrusina
9 years ago

  • Severity changed from normal to major

@ocean90
9 years ago

#11 @ocean90
9 years ago

  • Keywords has-patch added
  • Milestone changed from Awaiting Review to 4.4.2
  • Severity changed from major to normal

Since #33772 attachment metadata includes IPTC keywords. But the keywords are not UTF8 encoded like titles or captions, see tags/4.4/src/wp-admin/includes/image.php?marks=405-409#L404.

Current output:

array(5) {
  ["width"]=>
  int(4016)
  ["height"]=>
  int(2673)
  ["file"]=>
  string(23) "2016/01/gabcikovo-3.jpg"
  ["sizes"]=>
  array(5) {
    ["thumbnail"]=>
    array(4) {
      ["file"]=>
      string(23) "gabcikovo-3-150x150.jpg"
      ["width"]=>
      int(150)
      ["height"]=>
      int(150)
      ["mime-type"]=>
      string(10) "image/jpeg"
    }
    ["medium"]=>
    array(4) {
      ["file"]=>
      string(23) "gabcikovo-3-300x200.jpg"
      ["width"]=>
      int(300)
      ["height"]=>
      int(200)
      ["mime-type"]=>
      string(10) "image/jpeg"
    }
    ["medium_large"]=>
    array(4) {
      ["file"]=>
      string(23) "gabcikovo-3-768x511.jpg"
      ["width"]=>
      int(768)
      ["height"]=>
      int(511)
      ["mime-type"]=>
      string(10) "image/jpeg"
    }
    ["large"]=>
    array(4) {
      ["file"]=>
      string(24) "gabcikovo-3-1024x682.jpg"
      ["width"]=>
      int(1024)
      ["height"]=>
      int(682)
      ["mime-type"]=>
      string(10) "image/jpeg"
    }
    ["post-thumbnail"]=>
    array(4) {
      ["file"]=>
      string(24) "gabcikovo-3-1200x799.jpg"
      ["width"]=>
      int(1200)
      ["height"]=>
      int(799)
      ["mime-type"]=>
      string(10) "image/jpeg"
    }
  }
  ["image_meta"]=>
  array(12) {
    ["aperture"]=>
    float(4)
    ["credit"]=>
    string(4) "TASR"
    ["camera"]=>
    string(8) "NIKON D4"
    ["caption"]=>
    string(126) "Na snímke turbína na výrobu elektrickej energie vo Vodnej elektrárni Gabèíkovo 9. marca 2015. FOTO TASR - Martin Baumann"
    ["created_timestamp"]=>
    int(1425908436)
    ["copyright"]=>
    string(22) "Tlaèová agentúra SR"
    ["focal_length"]=>
    string(2) "14"
    ["iso"]=>
    string(4) "2500"
    ["shutter_speed"]=>
    string(17) "0.066666666666667"
    ["title"]=>
    string(46) "Vodná turbína na výrobu elektrickej energie"
    ["orientation"]=>
    int(1)
    ["keywords"]=>
    array(2) {
      [0]=>
      string(58) "Slovensko vl�da energetika Vodn� elektr�re� Gab��kovo prem"
      [1]=>
      string(17) "Fico n�v�teva TTX"
    }
  }
}

35316.patch encodes the keywords.


Until this gets fixed in core you can use the following function:

<?php
function trac35316_fix_iptc_keywords_encoding( $meta ) {
        foreach ( $meta['keywords'] as $key => $keyword ) {
                if ( ! seems_utf8( $keyword ) ) {
                        $meta['keywords'][ $key ] = utf8_encode( $keyword );
                }
        }

        return $meta;
}
add_filter( 'wp_read_image_metadata', 'trac35316_fix_iptc_keywords_encoding' );
Last edited 9 years ago by ocean90 (previous) (diff)

#12 follow-up: @pavelevap
9 years ago

@ocean90: Great, patch works well for adding available sizes.

How could wrong encoding leads to missing _wp_attachment_metadata? I am not sure about that...

There are still some encoding issues:

  • 3 original keywords from image
Slovensko vláda energetika Vodná elektráreň Gabčíkovo premiér
Fico návšteva TTX
Slovensko vláda energetika Vodná elektráreň Gabčíkovo prem
  • Only 2 results from _wp_attachment_metadata with some wrong encoding
[keywords] => Array
   (
       [0] => Slovensko vláda energetika Vodná elektráreò Gabèíkovo prem
       [1] => Fico návšteva TTX
   )

#13 in reply to: ↑ 12 @ocean90
9 years ago

Replying to pavelevap:

How could wrong encoding leads to missing _wp_attachment_metadata? I am not sure about that...

Because of the broken chars the data gets blocked by some sanity checks in wpdb.

3 original keywords from image

I have only 2:

("Slovensko vl\U00e1da energetika Vodn\U00e1 elektr\U00e1re\U0148 Gab\U010d\U00edkovo premi\U00e9r ","Fico n\U00e1v\U0161teva TTX")

with some wrong encoding

It uses utf8_encode() which is also used for titles and captions. Are those wrong too? If yes that should probably be handled in a separate ticket.

But I also noticed that the long keyword is truncated, but it's already truncated when the the data comes from iptcparse().

#14 @pavelevap
9 years ago

@ocean90: You are right, chars are probably stripped inside strip_invalid_text(): https://core.trac.wordpress.org/browser/tags/4.4.1/src/wp-includes/wp-db.php#L2788
And that is why process_fields() returns false here: https://core.trac.wordpress.org/browser/tags/4.4.1/src/wp-includes/wp-db.php#L2085

Function returns only 2 keywords, but Windows shows 3 (see attached screenshot). I am not sure what is wrong.

Encoding: Yes, also caption (later saved as post_excerpt) is wrong: Gabèíkovo should be Gabčíkovo. Also copyright is wrong, title probably does not contain problematic chars.

#15 @dd32
9 years ago

A test image full of non-utf8 data that we can test with and also throw into some unit tests would be beneficial here.

#16 @michalrusina
9 years ago

@dd32 this image triggers this bug (was referenced earlier) https://a-static.projektn.sk/2016/01/gabcikovo.jpg

@dd32
9 years ago

#17 @dd32
9 years ago

Thanks @michalrusina I read over that and missed it :(
I've uploaded it here for archival purposes incase the origin ever switches it out.

#18 @ocean90
9 years ago

  • Owner set to ocean90
  • Resolution set to fixed
  • Status changed from reopened to closed

In 36429:

Media: In wp_read_image_metadata() make sure that IPTC keywords are UTF8 encoded.

Prevents missing _wp_attachment_metadata when an image contains keywords with latin extended characters.

Fixes #35316.

#19 @ocean90
9 years ago

In 36430:

Media: In wp_read_image_metadata() make sure that IPTC keywords are UTF8 encoded.

Prevents missing _wp_attachment_metadata when an image contains keywords with latin extended characters.

Merges [36429] to the 4.4 branch.
See #35316.

#20 @DrewAPicture
9 years ago

In 36489:

Docs: Use the correct parameter name in the DocBlock for wp_kses_post_deep(), introduced in [36429].

Props sebastianpisula.
Fixes #35700. See #35316.

#21 @johnbillion
9 years ago

#35325 was marked as a duplicate.

Note: See TracTickets for help on using tickets.